Compare commits
8 Commits
v1.13.6-pr
...
v1.13.13-w
| Author | SHA1 | Date | |
|---|---|---|---|
| bc376c878d | |||
| 8b568b36d3 | |||
| 34cbecf975 | |||
| 5a3f357ce9 | |||
| fc11e8dc91 | |||
| 9ce638c916 | |||
| 8126d78b34 | |||
| b06a4a8e55 |
@@ -10,3 +10,12 @@ POSTGRES_PASSWORD=CHANGE_ME
|
|||||||
# Internal Tailscale address that bypasses Authelia. Override if you
|
# Internal Tailscale address that bypasses Authelia. Override if you
|
||||||
# point BooCode at a different SearXNG instance.
|
# point BooCode at a different SearXNG instance.
|
||||||
SEARXNG_URL=http://100.114.205.53:8888
|
SEARXNG_URL=http://100.114.205.53:8888
|
||||||
|
|
||||||
|
# v1.13.15-tools: BOOCODE_TOOLS narrows the tool whitelist sent to the LLM.
|
||||||
|
# Unset (default) → all tools (~21k schema). Useful primarily for single-purpose
|
||||||
|
# sessions where the model only needs read-only filesystem access.
|
||||||
|
#
|
||||||
|
# core → view_file, list_dir, grep, find_files (~2k)
|
||||||
|
# standard → core + web_*, git_status, all 8 codecontext_* tools (~10k)
|
||||||
|
# all → every tool in ALL_TOOLS (~21k)
|
||||||
|
# BOOCODE_TOOLS=all
|
||||||
|
|||||||
1
.gitignore
vendored
1
.gitignore
vendored
@@ -1,6 +1,7 @@
|
|||||||
node_modules
|
node_modules
|
||||||
dist
|
dist
|
||||||
.env
|
.env
|
||||||
|
CLAUDE.local.md
|
||||||
*.log
|
*.log
|
||||||
.DS_Store
|
.DS_Store
|
||||||
.vite
|
.vite
|
||||||
|
|||||||
@@ -1,7 +1,5 @@
|
|||||||
# BooChat
|
# BooChat
|
||||||
|
|
||||||
You are the assistant running inside BooChat — a self-hosted developer chat app.
|
|
||||||
|
|
||||||
## Capabilities
|
## Capabilities
|
||||||
|
|
||||||
- Read-only file tools: `view_file`, `list_dir`, `grep`, `find_files`
|
- Read-only file tools: `view_file`, `list_dir`, `grep`, `find_files`
|
||||||
|
|||||||
@@ -2,8 +2,6 @@
|
|||||||
|
|
||||||
> (Stub. v2.0 implementation pending. This file documents the intended contract.)
|
> (Stub. v2.0 implementation pending. This file documents the intended contract.)
|
||||||
|
|
||||||
You are the assistant running inside BooCoder — the write-capable companion to BooChat.
|
|
||||||
|
|
||||||
## Capabilities
|
## Capabilities
|
||||||
|
|
||||||
- Everything in `BOOCHAT.md`
|
- Everything in `BOOCHAT.md`
|
||||||
|
|||||||
11
CLAUDE.md
11
CLAUDE.md
@@ -47,10 +47,12 @@ Tests: `pnpm -C apps/server test` runs the vitest suite. No test harness on `app
|
|||||||
|
|
||||||
Key services:
|
Key services:
|
||||||
- **`services/inference/`** — Public surface re-exported via `inference/index.ts`; callers import from `./services/inference/index.js` explicitly (NodeNext doesn't honor directory-index resolution). Layout: `turn.ts` (runAssistantTurn / runInference / createInferenceRunner; exports `InferenceFrame`, `InferenceContext`, `TurnArgs`, `StreamResult`), `stream-phase.ts` (streamCompletion as a v1.13.1-A AI SDK adapter + executeStreamPhase), `provider.ts` (`upstreamModel(baseURL, modelId)` wrapping `createOpenAICompatible` against llama-swap), `tool-phase.ts` (executeToolPhase; value back-edges into turn.ts for the runAssistantTurn recursion — cycle safe because deref at call time, not module top-level), `sentinel-summaries.ts` (runCapHitSummary + runDoomLoopSummary + their sentinel inserters), `error-handler.ts` (handleAbortOrError, finalizeCompletion), `payload.ts` (buildMessagesPayload, loadContext, maybeFlagForCompaction, `OpenAiMessage`), `sentinels.ts` (`detectDoomLoop`, `DOOM_LOOP_THRESHOLD`, sentinel predicates), `budget.ts` (resolveToolBudget), `xml-parser.ts` (qwen3.6 XML tool-call fallback — KEEP, AI SDK doesn't handle inline-XML tool calls), `parts.ts` (v1.13.0 dual-write helpers: `partsFromAssistantMessage`, `partsFromToolMessage`, `insertParts`), `prune.ts` (v1.13.4 two-tier compaction; `selectPruneTargets` is the pure decision helper), `types.ts` (`StreamPhaseState`, `DB_FLUSH_INTERVAL_MS`). **`TurnArgs`** is the per-turn state envelope threaded through the `executeToolPhase → runAssistantTurn` recursion; reset in `runInference` at user-message boundary. Add new per-turn state to `TurnArgs`, not module-level closures.
|
- **`services/inference/`** — Public surface re-exported via `inference/index.ts`; callers import from `./services/inference/index.js` explicitly (NodeNext doesn't honor directory-index resolution). Layout: `turn.ts` (runAssistantTurn / runInference / createInferenceRunner; exports `InferenceFrame`, `InferenceContext`, `TurnArgs`, `StreamResult`), `stream-phase.ts` (streamCompletion as a v1.13.1-A AI SDK adapter + executeStreamPhase), `provider.ts` (`upstreamModel(baseURL, modelId)` wrapping `createOpenAICompatible` against llama-swap), `tool-phase.ts` (executeToolPhase; value back-edges into turn.ts for the runAssistantTurn recursion — cycle safe because deref at call time, not module top-level), `sentinel-summaries.ts` (runCapHitSummary + runDoomLoopSummary + their sentinel inserters), `error-handler.ts` (handleAbortOrError, finalizeCompletion), `payload.ts` (buildMessagesPayload, loadContext, maybeFlagForCompaction, `OpenAiMessage`), `sentinels.ts` (`detectDoomLoop`, `DOOM_LOOP_THRESHOLD`, sentinel predicates), `budget.ts` (resolveToolBudget), `xml-parser.ts` (qwen3.6 XML tool-call fallback — KEEP, AI SDK doesn't handle inline-XML tool calls), `parts.ts` (v1.13.0 dual-write helpers: `partsFromAssistantMessage`, `partsFromToolMessage`, `insertParts`), `prune.ts` (v1.13.4 two-tier compaction; `selectPruneTargets` is the pure decision helper), `types.ts` (`StreamPhaseState`, `DB_FLUSH_INTERVAL_MS`). **`TurnArgs`** is the per-turn state envelope threaded through the `executeToolPhase → runAssistantTurn` recursion; reset in `runInference` at user-message boundary. Add new per-turn state to `TurnArgs`, not module-level closures.
|
||||||
- **AI SDK v6 streamCompletion adapter** (v1.13.1-A; `services/inference/stream-phase.ts`). `streamText` is the underlying call; the BooCode layer above (executeStreamPhase, finalize, dual-write) is shape-preserved via an adapter. Three gotchas the LSP/test suite won't catch:
|
- **AI SDK v6 streamCompletion adapter** (v1.13.1-A; `services/inference/stream-phase.ts`). `streamText` is the underlying call; the BooCode layer above (executeStreamPhase, finalize, dual-write) is shape-preserved via an adapter. Five gotchas the LSP/test suite won't catch:
|
||||||
- **Abort signals are swallowed.** `streamText`'s `fullStream` iterator exits cleanly when `abortSignal` fires — no throw. Post-iteration `if (signal?.aborted) throw <AbortError>` is required; without it the row finalizes as `complete` instead of `cancelled`. Comment in stream-phase.ts pins this; don't refactor it away.
|
- **Abort signals are swallowed.** `streamText`'s `fullStream` iterator exits cleanly when `abortSignal` fires — no throw. Post-iteration `if (signal?.aborted) throw <AbortError>` is required; without it the row finalizes as `complete` instead of `cancelled`. Comment in stream-phase.ts pins this; don't refactor it away.
|
||||||
- **Usage lands only at stream end** via `await result.usage` (`inputTokens` / `outputTokens` v6 names → mapped to `promptTokens` / `completionTokens` for the existing onUsage callback). Mid-stream live tok/s is gone vs v1.12.2; ChatThroughput shows a single value at stream end.
|
- **Usage lands only at stream end** via `await result.usage` (`inputTokens` / `outputTokens` v6 names → mapped to `promptTokens` / `completionTokens` for the existing onUsage callback). Mid-stream live tok/s is gone vs v1.12.2; ChatThroughput shows a single value at stream end.
|
||||||
- **Tools have NO `execute` field.** BooCode dispatches tools in tool-phase.ts, not the AI SDK loop. Only `description` + `inputSchema: jsonSchema(parameters)` — surfacing tool-call parts via `fullStream` and stopping is what we want.
|
- **Tools have NO `execute` field.** BooCode dispatches tools in tool-phase.ts, not the AI SDK loop. Only `description` + `inputSchema: jsonSchema(parameters)` — surfacing tool-call parts via `fullStream` and stopping is what we want.
|
||||||
|
- **`includeUsage: true` MUST be set on `createOpenAICompatible`** in `services/inference/provider.ts`. The adapter defaults it false, omitting `stream_options.include_usage` from the request body; llama-swap then never emits the usage block and `result.usage.inputTokens/outputTokens` resolve to `undefined`. Latent regression from v1.13.1-A through v1.13.7 — every assistant row in that window has `tokens_used`/`ctx_used` NULL. Don't remove this flag during refactor.
|
||||||
|
- **Tool-call-only turns may emit a leading `\n` text-delta** as the assistant content. `MessageList.flatten`'s `hasText` and `MessageBubble`'s `hasContent` both `.trim()` before the length check — otherwise whitespace-only content renders an empty bubble + ActionRow between every tool call (v1.13.7 fix). `payload.ts:buildMessagesPayload` also skips `status='failed'` AND complete-but-empty (no content, no tool_calls) assistant rows to avoid "Cannot have 2 or more assistant messages at the end of the list" upstream rejections after cap-hit + Continue.
|
||||||
- **AI SDK ModelMessage conversion** (`toModelMessages` in stream-phase.ts). Tool messages need a `toolName` for `ToolResultPart` — BooCode's OpenAI-shape history doesn't carry it, so a forward-scan builds a `tool_call_id → toolName` map from prior assistant `tool_calls`. Tool outputs wrapped as `{ type: 'json' | 'text', value }` matching the v6 `ToolResultOutput` union. Assistant messages with reasoning emit a `ReasoningPart` first in the content array (v1.13.1-C).
|
- **AI SDK ModelMessage conversion** (`toModelMessages` in stream-phase.ts). Tool messages need a `toolName` for `ToolResultPart` — BooCode's OpenAI-shape history doesn't carry it, so a forward-scan builds a `tool_call_id → toolName` map from prior assistant `tool_calls`. Tool outputs wrapped as `{ type: 'json' | 'text', value }` matching the v6 `ToolResultOutput` union. Assistant messages with reasoning emit a `ReasoningPart` first in the content array (v1.13.1-C).
|
||||||
- **`experimental_repairToolCall`** (v1.13.3) wired into `streamText` to keep the stream alive when qwen3.6 emits malformed tool args. Pass-through implementation — logs the bad call and returns it unmodified; `executeToolPhase`'s existing zod-reject error path routes it to the model on the next turn.
|
- **`experimental_repairToolCall`** (v1.13.3) wired into `streamText` to keep the stream alive when qwen3.6 emits malformed tool args. Pass-through implementation — logs the bad call and returns it unmodified; `executeToolPhase`'s existing zod-reject error path routes it to the model on the next turn.
|
||||||
- **`chat_status` frame shape** (published via `broker.publishUser`) — `status: 'streaming' | 'tool_running' | 'waiting_for_input' | 'idle' | 'error'` (widened from `working|idle|error` in v1.12.1). Frontend `useChatStatus` derives `idle_warm` (<30s since idle) vs `idle_cold`. `ChatThroughput` renders inline beside `StatusDot` only when streaming or tool_running, fed by 500ms-throttled `'usage'` WS frames (`completion_tokens` + `ctx_used` + `ctx_max`). The `POST /api/chats/:id/discard_stale` endpoint exists to mark a stuck-streaming row as `failed` when the frontend's 60s no-token-activity timer (`ChatPane` content-length watcher) gives up.
|
- **`chat_status` frame shape** (published via `broker.publishUser`) — `status: 'streaming' | 'tool_running' | 'waiting_for_input' | 'idle' | 'error'` (widened from `working|idle|error` in v1.12.1). Frontend `useChatStatus` derives `idle_warm` (<30s since idle) vs `idle_cold`. `ChatThroughput` renders inline beside `StatusDot` only when streaming or tool_running, fed by 500ms-throttled `'usage'` WS frames (`completion_tokens` + `ctx_used` + `ctx_max`). The `POST /api/chats/:id/discard_stale` endpoint exists to mark a stuck-streaming row as `failed` when the frontend's 60s no-token-activity timer (`ChatPane` content-length watcher) gives up.
|
||||||
@@ -58,7 +60,9 @@ Key services:
|
|||||||
- **Periodic 60s sweeper** in `apps/server/src/index.ts` (v1.13.3 + v1.13.5). Same `setInterval` runs `sweepStaleStreaming` (marks `messages.status='streaming'` older than 5 min as `failed`, publishes `chat_status='idle'` so the UI dot drops) and `cleanupTruncations` (TTL + orphan reap of tmpfs truncation files). `app.addHook('onClose')` clears the timer. No-op when nothing to reap.
|
- **Periodic 60s sweeper** in `apps/server/src/index.ts` (v1.13.3 + v1.13.5). Same `setInterval` runs `sweepStaleStreaming` (marks `messages.status='streaming'` older than 5 min as `failed`, publishes `chat_status='idle'` so the UI dot drops) and `cleanupTruncations` (TTL + orphan reap of tmpfs truncation files). `app.addHook('onClose')` clears the timer. No-op when nothing to reap.
|
||||||
- **`services/broker.ts`** — In-memory pub/sub with two channel types: per-session (message streaming) and per-user (sidebar updates). No persistence; clients reconnect on restart.
|
- **`services/broker.ts`** — In-memory pub/sub with two channel types: per-session (message streaming) and per-user (sidebar updates). No persistence; clients reconnect on restart.
|
||||||
- **`services/tools.ts`** — Tool registry (`ALL_TOOLS`, `READ_ONLY_TOOL_NAMES`, `TOOLS_BY_NAME`). Filesystem tools (view_file/list_dir/grep/find_files) go through three guard layers: `path_guard.ts` (workspace scope), `secret_guard.ts` (filename deny list), `url_guard.ts` (SSRF/private-IP block for web_fetch). v1.11.8+ web tools (`web_search`, `web_fetch`) are opt-in per chat via `session.web_search_enabled` (resolved with `project.default_web_search_enabled` fallback) and filtered out of the LLM's tool schema when false. v1.13.5 truncation: when a tool slice cuts content, `services/truncate.ts` stashes the full text on tmpfs at `BOOCODE_TRUNCATION_DIR` (default `/tmp/boocode-truncations`, 0o700) keyed by an opaque `tr_<12 base32 chars>` id, and the `view_truncated_output(id)` tool retrieves it. 5MB cap (matches `view_file`'s `MAX_FILE_BYTES`), 7-day TTL, reaped by the periodic sweeper. Tmpfs path means container restart loses retrieval — acceptable, the model usually has moved on.
|
- **`services/tools.ts`** — Tool registry (`ALL_TOOLS`, `READ_ONLY_TOOL_NAMES`, `TOOLS_BY_NAME`). Filesystem tools (view_file/list_dir/grep/find_files) go through three guard layers: `path_guard.ts` (workspace scope), `secret_guard.ts` (filename deny list), `url_guard.ts` (SSRF/private-IP block for web_fetch). v1.11.8+ web tools (`web_search`, `web_fetch`) are opt-in per chat via `session.web_search_enabled` (resolved with `project.default_web_search_enabled` fallback) and filtered out of the LLM's tool schema when false. v1.13.5 truncation: when a tool slice cuts content, `services/truncate.ts` stashes the full text on tmpfs at `BOOCODE_TRUNCATION_DIR` (default `/tmp/boocode-truncations`, 0o700) keyed by an opaque `tr_<12 base32 chars>` id, and the `view_truncated_output(id)` tool retrieves it. 5MB cap (matches `view_file`'s `MAX_FILE_BYTES`), 7-day TTL, reaped by the periodic sweeper. Tmpfs path means container restart loses retrieval — acceptable, the model usually has moved on.
|
||||||
- **`services/compaction.ts`** + **`services/model-context.ts`** — v1.11.0 anchored rolling summary (single `summary=true` assistant row per chat, supersedes itself on each compaction). Triggered when `chats.needs_compaction` is set after an inference turn exceeds `usable(ctx_max) = ctx_max - 20k`. **`ctx_max` comes from `model-context.getModelContext()` which fetches `${LLAMA_SWAP_URL}/upstream/<model>/props`** — NOT from `parsed.timings.n_ctx` (the stream completion's `timings` doesn't carry n_ctx; that read was dead code until v1.11.3 ripped it out). v1.13.6: `buildHeadPayload` embeds `reasoning_parts` as a `<reasoning>...</reasoning>` prose prefix on the assistant `content` (OpenAI wire shape has no structured reasoning field; the summarizer reads text). Standalone tag when content is empty (tool-call-only turn). `buildHeadPayload` + `OpenAiMessage` exported for test access — keep them exported.
|
- **`services/compaction.ts`** + **`services/model-context.ts`** — v1.11.0 anchored rolling summary (single `summary=true` assistant row per chat, supersedes itself on each compaction). Triggered when `chats.needs_compaction` is set after an inference turn exceeds `usable(ctx_max) = floor(0.85 × ctx_max)` (v1.13.9 opencode-pattern early trigger; was `ctx_max - 20k` pre-v1.13.9, which gave only 7.6% headroom at 262k and 0 budget for ≤20k contexts). **`ctx_max` comes from `model-context.getModelContext()` which fetches `${LLAMA_SWAP_URL}/upstream/<model>/props`** — NOT from `parsed.timings.n_ctx` (the stream completion's `timings` doesn't carry n_ctx; that read was dead code until v1.11.3 ripped it out). First inferences after a boocode boot may have `ctx_max=NULL` if llama-swap hasn't loaded the model yet; negative cache TTL is 60s, recovers on next turn. v1.13.6: `buildHeadPayload` embeds `reasoning_parts` as a `<reasoning>...</reasoning>` prose prefix on the assistant `content` (OpenAI wire shape has no structured reasoning field; the summarizer reads text). Standalone tag when content is empty (tool-call-only turn). `buildHeadPayload` + `OpenAiMessage` exported for test access — keep them exported.
|
||||||
|
- **`services/system-prompt.ts`** — `buildSystemPrompt` is the string-returning shim; `buildSystemPromptWithFingerprint` is the canonical impl returning `{prompt, fingerprint, drift}`. v1.13.8 instrumentation: SHA-256 of the assembled prefix is logged per `buildMessagesPayload` call (msg `prefix-fingerprint`, level=info); a `Map<sessionId, lastHash>` observer fires `prefix-drift` (level=warn) on hash change with a field-level `changed_inputs` diff. Smoke proved the prefix is byte-stable across turns in steady-state — the originally-planned `system_prompt_cache` DB table was dropped as redundant against the v1.12.0 input-layer mtime caches (BOOCHAT.md here + AGENTS.md global+per-project in `agents.ts:safeStat`).
|
||||||
|
- **`services/inference/budget.ts`** — tool-call budgets: `BUDGET_READ_ONLY = 30`, `BUDGET_NON_READ_ONLY = 10` (forward-looking; no write tools yet), `BUDGET_NO_AGENT = 30` (v1.13.7; was 15 — every tool in `ALL_TOOLS` is read-only today, so no-agent mode shares the read-only-agent cap). Per-agent `max_tool_calls` from AGENTS.md frontmatter overrides.
|
||||||
- **`messages_with_parts` view** (v1.13.1-B; `schema.sql`). Read sites that need `tool_calls` / `tool_results` / `reasoning_parts` SELECT from this view, NOT `messages` directly. `COALESCE`s parts-table rows over the legacy JSON columns, so pre-v1.13.0 history still resolves. Writes still target `messages`; the v1.13.0 dual-write into `message_parts` keeps both halves in sync. New payload-assembly code must use the view — calling `messages.tool_calls` directly will miss anything written post-v1.13.1-B if the JSON column ever drifts (and dual-write makes that easy to miss). Shapes: `tool_calls jsonb[]`, `tool_results jsonb` single object, `reasoning_parts jsonb[]` of `{text}`.
|
- **`messages_with_parts` view** (v1.13.1-B; `schema.sql`). Read sites that need `tool_calls` / `tool_results` / `reasoning_parts` SELECT from this view, NOT `messages` directly. `COALESCE`s parts-table rows over the legacy JSON columns, so pre-v1.13.0 history still resolves. Writes still target `messages`; the v1.13.0 dual-write into `message_parts` keeps both halves in sync. New payload-assembly code must use the view — calling `messages.tool_calls` directly will miss anything written post-v1.13.1-B if the JSON column ever drifts (and dual-write makes that easy to miss). Shapes: `tool_calls jsonb[]`, `tool_results jsonb` single object, `reasoning_parts jsonb[]` of `{text}`.
|
||||||
- **`services/file_ops.ts`** — Shared file operation implementations used by both inference tools and HTTP routes.
|
- **`services/file_ops.ts`** — Shared file operation implementations used by both inference tools and HTTP routes.
|
||||||
- **`services/auto_name.ts`** — Non-streaming LLM call to generate 4-word session titles after first assistant reply.
|
- **`services/auto_name.ts`** — Non-streaming LLM call to generate 4-word session titles after first assistant reply.
|
||||||
@@ -108,11 +112,12 @@ Schema CHECK migration order when renaming allowed values: (1) `ALTER TABLE ...
|
|||||||
|
|
||||||
## Environment
|
## Environment
|
||||||
|
|
||||||
Required: `DATABASE_URL`, `LLAMA_SWAP_URL`. Optional: `PORT` (3000), `HOST` (0.0.0.0), `PROJECT_ROOT_WHITELIST` (/opt, read-only scope for add-existing path resolution), `BOOTSTRAP_ROOT` (/opt/projects, writable scope for create-new-project bootstrap mkdir target — host must `mkdir -p /opt/projects` before container start), `DEFAULT_MODEL`, `LOG_LEVEL`, `SEARXNG_URL` (default `http://100.114.205.53:8888` — internal Tailscale Fathom; the public `search.indifferentketchup.com` is behind Authelia and unusable from server context).
|
Required: `DATABASE_URL`, `LLAMA_SWAP_URL`. Optional: `PORT` (3000), `HOST` (0.0.0.0), `PROJECT_ROOT_WHITELIST` (/opt, read-only scope for add-existing path resolution), `BOOTSTRAP_ROOT` (/opt/projects, writable scope for create-new-project bootstrap mkdir target — host must `mkdir -p /opt/projects` before container start), `DEFAULT_MODEL`, `LOG_LEVEL`, `SEARXNG_URL` (default `http://100.114.205.53:8888` — internal Tailscale Fathom; the public `search.indifferentketchup.com` is behind Authelia and unusable from server context), `BOOCODE_TOOLS` (`core` | `standard` | `all`, default `all`; v1.13.15-tools tier filter — ceiling, never expands an agent's whitelist).
|
||||||
|
|
||||||
## Workflow
|
## Workflow
|
||||||
|
|
||||||
- Sam reviews all diffs and commits manually. Do not commit unless explicitly asked.
|
- Sam reviews all diffs and commits manually. Do not commit unless explicitly asked.
|
||||||
|
- Per-batch docs live under `openspec/changes/<slug>/{proposal,tasks,design}.md`. Already-shipped batches are snapshots in `openspec/changes/archived/`. New batches follow the proposal+tasks shape; see `openspec/README.md` for the convention.
|
||||||
- Deploy: `cd /opt/boocode && docker compose up --build -d` (or `docker compose build --no-cache boocode && docker compose up -d` if you suspect a layer-cache issue).
|
- Deploy: `cd /opt/boocode && docker compose up --build -d` (or `docker compose build --no-cache boocode && docker compose up -d` if you suspect a layer-cache issue).
|
||||||
- Git push to Gitea: `GIT_SSH_COMMAND="ssh -i /opt/boocode/secrets/boocode_gitea -o IdentitiesOnly=yes" git push origin <branch>`. The default agent identity is rejected; the in-repo deploy key (`secrets/`, gitignored) is the working one. Transient `Connection reset by peer` retries cleanly after `sleep 5`.
|
- Git push to Gitea: `GIT_SSH_COMMAND="ssh -i /opt/boocode/secrets/boocode_gitea -o IdentitiesOnly=yes" git push origin <branch>`. The default agent identity is rejected; the in-repo deploy key (`secrets/`, gitignored) is the working one. Transient `Connection reset by peer` retries cleanly after `sleep 5`.
|
||||||
- Don't accumulate `.bak-*` files. Clean them up in the same batch or immediately after merge.
|
- Don't accumulate `.bak-*` files. Clean them up in the same batch or immediately after merge.
|
||||||
|
|||||||
@@ -16,6 +16,7 @@ import { registerWebSocket } from './routes/ws.js';
|
|||||||
import { registerModelRoutes } from './routes/models.js';
|
import { registerModelRoutes } from './routes/models.js';
|
||||||
import { registerAgentRoutes } from './routes/agents.js';
|
import { registerAgentRoutes } from './routes/agents.js';
|
||||||
import { registerSkillsRoutes } from './routes/skills.js';
|
import { registerSkillsRoutes } from './routes/skills.js';
|
||||||
|
import { registerToolsRoutes } from './routes/tools.js';
|
||||||
import { createInferenceRunner } from './services/inference/index.js';
|
import { createInferenceRunner } from './services/inference/index.js';
|
||||||
import { createBroker } from './services/broker.js';
|
import { createBroker } from './services/broker.js';
|
||||||
import { listSkills } from './services/skills.js';
|
import { listSkills } from './services/skills.js';
|
||||||
@@ -74,7 +75,7 @@ async function main() {
|
|||||||
return { status: dbOk ? 'ok' : 'degraded', db: dbOk };
|
return { status: dbOk ? 'ok' : 'degraded', db: dbOk };
|
||||||
});
|
});
|
||||||
|
|
||||||
const broker = createBroker();
|
const broker = createBroker(app.log);
|
||||||
|
|
||||||
registerProjectRoutes(app, sql, config, broker);
|
registerProjectRoutes(app, sql, config, broker);
|
||||||
registerSessionRoutes(app, sql, config, broker);
|
registerSessionRoutes(app, sql, config, broker);
|
||||||
@@ -83,6 +84,7 @@ async function main() {
|
|||||||
registerAgentRoutes(app, sql);
|
registerAgentRoutes(app, sql);
|
||||||
registerSidebarRoutes(app, sql);
|
registerSidebarRoutes(app, sql);
|
||||||
registerChatRoutes(app, sql, broker);
|
registerChatRoutes(app, sql, broker);
|
||||||
|
registerToolsRoutes(app, sql);
|
||||||
|
|
||||||
// Batch 9.6: warm the skills cache at boot and surface the count. Empty or
|
// Batch 9.6: warm the skills cache at boot and surface the count. Empty or
|
||||||
// missing /data/skills is non-fatal — the skill tools just return empty.
|
// missing /data/skills is non-fatal — the skill tools just return empty.
|
||||||
@@ -99,7 +101,9 @@ async function main() {
|
|||||||
config,
|
config,
|
||||||
log: app.log,
|
log: app.log,
|
||||||
publish: (sessionId, frame) => {
|
publish: (sessionId, frame) => {
|
||||||
broker.publish(sessionId, frame as unknown as Record<string, unknown> & { type: string });
|
// v1.13.11-b: route through the typed publishFrame so the broker's
|
||||||
|
// Zod gate validates every inference frame before delivery.
|
||||||
|
broker.publishFrame(sessionId, frame as unknown as import('./types/ws-frames.js').WsFrame);
|
||||||
},
|
},
|
||||||
// v1.11: broker handle for compaction.process to publish 'compacted'
|
// v1.11: broker handle for compaction.process to publish 'compacted'
|
||||||
// frames on the per-session channel. Inference's regular publish path
|
// frames on the per-session channel. Inference's regular publish path
|
||||||
@@ -108,7 +112,7 @@ async function main() {
|
|||||||
broker,
|
broker,
|
||||||
},
|
},
|
||||||
(user, frame) => {
|
(user, frame) => {
|
||||||
broker.publishUser(user, frame as unknown as Record<string, unknown> & { type: string });
|
broker.publishUserFrame(user, frame as unknown as import('./types/ws-frames.js').WsFrame);
|
||||||
}
|
}
|
||||||
);
|
);
|
||||||
registerMessageRoutes(app, sql, {
|
registerMessageRoutes(app, sql, {
|
||||||
@@ -127,33 +131,33 @@ async function main() {
|
|||||||
},
|
},
|
||||||
hasActiveInference: (chatId) => inference.hasActive(chatId),
|
hasActiveInference: (chatId) => inference.hasActive(chatId),
|
||||||
publishUserMessage: (sessionId, chatId, userMessageId, content) => {
|
publishUserMessage: (sessionId, chatId, userMessageId, content) => {
|
||||||
broker.publish(sessionId, {
|
broker.publishFrame(sessionId, {
|
||||||
type: 'message_started',
|
type: 'message_started',
|
||||||
message_id: userMessageId,
|
message_id: userMessageId,
|
||||||
chat_id: chatId,
|
chat_id: chatId,
|
||||||
role: 'user',
|
role: 'user',
|
||||||
});
|
});
|
||||||
broker.publish(sessionId, {
|
broker.publishFrame(sessionId, {
|
||||||
type: 'delta',
|
type: 'delta',
|
||||||
message_id: userMessageId,
|
message_id: userMessageId,
|
||||||
chat_id: chatId,
|
chat_id: chatId,
|
||||||
content,
|
content,
|
||||||
});
|
});
|
||||||
broker.publish(sessionId, {
|
broker.publishFrame(sessionId, {
|
||||||
type: 'message_complete',
|
type: 'message_complete',
|
||||||
message_id: userMessageId,
|
message_id: userMessageId,
|
||||||
chat_id: chatId,
|
chat_id: chatId,
|
||||||
});
|
});
|
||||||
},
|
},
|
||||||
publishMessagesDeleted: (sessionId, chatId, messageIds) => {
|
publishMessagesDeleted: (sessionId, chatId, messageIds) => {
|
||||||
broker.publish(sessionId, {
|
broker.publishFrame(sessionId, {
|
||||||
type: 'messages_deleted',
|
type: 'messages_deleted',
|
||||||
message_ids: messageIds,
|
message_ids: messageIds,
|
||||||
chat_id: chatId,
|
chat_id: chatId,
|
||||||
});
|
});
|
||||||
},
|
},
|
||||||
publishSessionFrame: (sessionId, frame) => {
|
publishSessionFrame: (sessionId, frame) => {
|
||||||
broker.publish(sessionId, frame);
|
broker.publishFrame(sessionId, frame as import('./types/ws-frames.js').WsFrame);
|
||||||
},
|
},
|
||||||
});
|
});
|
||||||
registerSkillsRoutes(app, sql, {
|
registerSkillsRoutes(app, sql, {
|
||||||
@@ -161,26 +165,26 @@ async function main() {
|
|||||||
inference.enqueue(sessionId, chatId, assistantId, user);
|
inference.enqueue(sessionId, chatId, assistantId, user);
|
||||||
},
|
},
|
||||||
publishUserMessage: (sessionId, chatId, userMessageId, content) => {
|
publishUserMessage: (sessionId, chatId, userMessageId, content) => {
|
||||||
broker.publish(sessionId, {
|
broker.publishFrame(sessionId, {
|
||||||
type: 'message_started',
|
type: 'message_started',
|
||||||
message_id: userMessageId,
|
message_id: userMessageId,
|
||||||
chat_id: chatId,
|
chat_id: chatId,
|
||||||
role: 'user',
|
role: 'user',
|
||||||
});
|
});
|
||||||
broker.publish(sessionId, {
|
broker.publishFrame(sessionId, {
|
||||||
type: 'delta',
|
type: 'delta',
|
||||||
message_id: userMessageId,
|
message_id: userMessageId,
|
||||||
chat_id: chatId,
|
chat_id: chatId,
|
||||||
content,
|
content,
|
||||||
});
|
});
|
||||||
broker.publish(sessionId, {
|
broker.publishFrame(sessionId, {
|
||||||
type: 'message_complete',
|
type: 'message_complete',
|
||||||
message_id: userMessageId,
|
message_id: userMessageId,
|
||||||
chat_id: chatId,
|
chat_id: chatId,
|
||||||
});
|
});
|
||||||
},
|
},
|
||||||
publishSessionFrame: (sessionId, frame) => {
|
publishSessionFrame: (sessionId, frame) => {
|
||||||
broker.publish(sessionId, frame);
|
broker.publishFrame(sessionId, frame as import('./types/ws-frames.js').WsFrame);
|
||||||
},
|
},
|
||||||
});
|
});
|
||||||
registerWebSocket(app, sql, broker);
|
registerWebSocket(app, sql, broker);
|
||||||
@@ -228,7 +232,7 @@ async function main() {
|
|||||||
for (const row of rows) {
|
for (const row of rows) {
|
||||||
if (seenChats.has(row.chat_id)) continue;
|
if (seenChats.has(row.chat_id)) continue;
|
||||||
seenChats.add(row.chat_id);
|
seenChats.add(row.chat_id);
|
||||||
broker.publishUser('default', {
|
broker.publishUserFrame('default', {
|
||||||
type: 'chat_status',
|
type: 'chat_status',
|
||||||
chat_id: row.chat_id,
|
chat_id: row.chat_id,
|
||||||
status: 'idle',
|
status: 'idle',
|
||||||
|
|||||||
@@ -102,7 +102,7 @@ export function registerChatRoutes(
|
|||||||
VALUES (${req.params.id}, ${parsed.data.name ?? null}, 'open')
|
VALUES (${req.params.id}, ${parsed.data.name ?? null}, 'open')
|
||||||
RETURNING id, session_id, name, status, created_at, updated_at
|
RETURNING id, session_id, name, status, created_at, updated_at
|
||||||
`;
|
`;
|
||||||
broker.publishUser('default', {
|
broker.publishUserFrame('default', {
|
||||||
type: 'chat_created',
|
type: 'chat_created',
|
||||||
chat: chat!,
|
chat: chat!,
|
||||||
session_id: req.params.id,
|
session_id: req.params.id,
|
||||||
@@ -132,7 +132,7 @@ export function registerChatRoutes(
|
|||||||
return { error: 'chat not found' };
|
return { error: 'chat not found' };
|
||||||
}
|
}
|
||||||
const chat = rows[0]!;
|
const chat = rows[0]!;
|
||||||
broker.publishUser('default', {
|
broker.publishUserFrame('default', {
|
||||||
type: 'chat_updated',
|
type: 'chat_updated',
|
||||||
chat_id: chat.id,
|
chat_id: chat.id,
|
||||||
session_id: chat.session_id,
|
session_id: chat.session_id,
|
||||||
@@ -162,7 +162,7 @@ export function registerChatRoutes(
|
|||||||
`;
|
`;
|
||||||
const ids = rows.map((r) => r.id);
|
const ids = rows.map((r) => r.id);
|
||||||
for (const id of ids) {
|
for (const id of ids) {
|
||||||
broker.publishUser('default', {
|
broker.publishUserFrame('default', {
|
||||||
type: 'chat_archived',
|
type: 'chat_archived',
|
||||||
chat_id: id,
|
chat_id: id,
|
||||||
session_id: req.params.id,
|
session_id: req.params.id,
|
||||||
@@ -203,7 +203,7 @@ export function registerChatRoutes(
|
|||||||
return { error: 'chat not found or already archived' };
|
return { error: 'chat not found or already archived' };
|
||||||
}
|
}
|
||||||
const row = rows[0]!;
|
const row = rows[0]!;
|
||||||
broker.publishUser('default', {
|
broker.publishUserFrame('default', {
|
||||||
type: 'chat_archived',
|
type: 'chat_archived',
|
||||||
chat_id: row.id,
|
chat_id: row.id,
|
||||||
session_id: row.session_id,
|
session_id: row.session_id,
|
||||||
@@ -226,7 +226,7 @@ export function registerChatRoutes(
|
|||||||
return { error: 'chat not found or not archived' };
|
return { error: 'chat not found or not archived' };
|
||||||
}
|
}
|
||||||
const chat = rows[0]!;
|
const chat = rows[0]!;
|
||||||
broker.publishUser('default', { type: 'chat_unarchived', chat });
|
broker.publishUserFrame('default', { type: 'chat_unarchived', chat });
|
||||||
return chat;
|
return chat;
|
||||||
}
|
}
|
||||||
);
|
);
|
||||||
@@ -243,7 +243,7 @@ export function registerChatRoutes(
|
|||||||
return { error: 'chat not found' };
|
return { error: 'chat not found' };
|
||||||
}
|
}
|
||||||
const row = result[0]!;
|
const row = result[0]!;
|
||||||
broker.publishUser('default', {
|
broker.publishUserFrame('default', {
|
||||||
type: 'chat_deleted',
|
type: 'chat_deleted',
|
||||||
chat_id: row.id,
|
chat_id: row.id,
|
||||||
session_id: row.session_id,
|
session_id: row.session_id,
|
||||||
@@ -338,7 +338,7 @@ export function registerChatRoutes(
|
|||||||
return chat!;
|
return chat!;
|
||||||
});
|
});
|
||||||
|
|
||||||
broker.publishUser('default', {
|
broker.publishUserFrame('default', {
|
||||||
type: 'chat_created',
|
type: 'chat_created',
|
||||||
chat: newChat,
|
chat: newChat,
|
||||||
session_id: source.session_id,
|
session_id: source.session_id,
|
||||||
@@ -400,13 +400,13 @@ export function registerChatRoutes(
|
|||||||
reply.code(409);
|
reply.code(409);
|
||||||
return { error: 'message status changed mid-request' };
|
return { error: 'message status changed mid-request' };
|
||||||
}
|
}
|
||||||
broker.publishUser('default', {
|
broker.publishUserFrame('default', {
|
||||||
type: 'chat_status',
|
type: 'chat_status',
|
||||||
chat_id: msg.chat_id,
|
chat_id: msg.chat_id,
|
||||||
status: 'idle',
|
status: 'idle',
|
||||||
at: new Date().toISOString(),
|
at: new Date().toISOString(),
|
||||||
});
|
});
|
||||||
broker.publish(msg.session_id, {
|
broker.publishFrame(msg.session_id, {
|
||||||
type: 'message_complete',
|
type: 'message_complete',
|
||||||
message_id: msg.id,
|
message_id: msg.id,
|
||||||
chat_id: msg.chat_id,
|
chat_id: msg.chat_id,
|
||||||
|
|||||||
@@ -129,7 +129,7 @@ export function registerProjectRoutes(
|
|||||||
RETURNING id, name, path, added_at, last_session_id, status, gitea_remote,
|
RETURNING id, name, path, added_at, last_session_id, status, gitea_remote,
|
||||||
default_system_prompt, default_web_search_enabled
|
default_system_prompt, default_web_search_enabled
|
||||||
`;
|
`;
|
||||||
broker.publishUser('default', { type: 'project_created', project: row as unknown as Project });
|
broker.publishUserFrame('default', { type: 'project_created', project: row as unknown as Project });
|
||||||
reply.code(201);
|
reply.code(201);
|
||||||
return {
|
return {
|
||||||
project: row,
|
project: row,
|
||||||
@@ -186,11 +186,11 @@ export function registerProjectRoutes(
|
|||||||
`;
|
`;
|
||||||
|
|
||||||
if (existing.length === 0) {
|
if (existing.length === 0) {
|
||||||
broker.publishUser('default', { type: 'project_created', project: row as unknown as Project });
|
broker.publishUserFrame('default', { type: 'project_created', project: row as unknown as Project });
|
||||||
reply.code(201);
|
reply.code(201);
|
||||||
} else {
|
} else {
|
||||||
// existing.status was 'archived' — row has been restored.
|
// existing.status was 'archived' — row has been restored.
|
||||||
broker.publishUser('default', { type: 'project_unarchived', project: row as unknown as Project });
|
broker.publishUserFrame('default', { type: 'project_unarchived', project: row as unknown as Project });
|
||||||
reply.code(200);
|
reply.code(200);
|
||||||
}
|
}
|
||||||
return row;
|
return row;
|
||||||
@@ -243,7 +243,7 @@ export function registerProjectRoutes(
|
|||||||
// v1.9: the project_updated frame still only carries id + name. Clients
|
// v1.9: the project_updated frame still only carries id + name. Clients
|
||||||
// that need the new fields refetch via api.projects.list() — keeps the
|
// that need the new fields refetch via api.projects.list() — keeps the
|
||||||
// frame payload lean, per the locked recon decision (d).
|
// frame payload lean, per the locked recon decision (d).
|
||||||
broker.publishUser('default', {
|
broker.publishUserFrame('default', {
|
||||||
type: 'project_updated',
|
type: 'project_updated',
|
||||||
project_id: project.id,
|
project_id: project.id,
|
||||||
name: project.name,
|
name: project.name,
|
||||||
@@ -260,7 +260,7 @@ export function registerProjectRoutes(
|
|||||||
reply.code(404);
|
reply.code(404);
|
||||||
return { error: 'not found or already archived' };
|
return { error: 'not found or already archived' };
|
||||||
}
|
}
|
||||||
broker.publishUser('default', { type: 'project_archived', project_id: req.params.id });
|
broker.publishUserFrame('default', { type: 'project_archived', project_id: req.params.id });
|
||||||
reply.code(204);
|
reply.code(204);
|
||||||
return null;
|
return null;
|
||||||
});
|
});
|
||||||
@@ -277,7 +277,7 @@ export function registerProjectRoutes(
|
|||||||
return { error: 'not found or not archived' };
|
return { error: 'not found or not archived' };
|
||||||
}
|
}
|
||||||
const project = rows[0]!;
|
const project = rows[0]!;
|
||||||
broker.publishUser('default', { type: 'project_unarchived', project });
|
broker.publishUserFrame('default', { type: 'project_unarchived', project });
|
||||||
return project;
|
return project;
|
||||||
});
|
});
|
||||||
|
|
||||||
@@ -288,7 +288,7 @@ export function registerProjectRoutes(
|
|||||||
reply.code(404);
|
reply.code(404);
|
||||||
return { error: 'not found' };
|
return { error: 'not found' };
|
||||||
}
|
}
|
||||||
broker.publishUser('default', { type: 'project_deleted', project_id: id });
|
broker.publishUserFrame('default', { type: 'project_deleted', project_id: id });
|
||||||
reply.code(204);
|
reply.code(204);
|
||||||
return null;
|
return null;
|
||||||
});
|
});
|
||||||
|
|||||||
@@ -112,7 +112,7 @@ export function registerSessionRoutes(
|
|||||||
`;
|
`;
|
||||||
return session!;
|
return session!;
|
||||||
});
|
});
|
||||||
broker.publishUser('default', {
|
broker.publishUserFrame('default', {
|
||||||
type: 'session_created',
|
type: 'session_created',
|
||||||
session: row,
|
session: row,
|
||||||
project_id: row.project_id,
|
project_id: row.project_id,
|
||||||
@@ -178,7 +178,7 @@ export function registerSessionRoutes(
|
|||||||
}
|
}
|
||||||
const session = rows[0]!;
|
const session = rows[0]!;
|
||||||
if (name !== undefined && session.name !== priorName) {
|
if (name !== undefined && session.name !== priorName) {
|
||||||
broker.publishUser('default', {
|
broker.publishUserFrame('default', {
|
||||||
type: 'session_renamed',
|
type: 'session_renamed',
|
||||||
session_id: session.id,
|
session_id: session.id,
|
||||||
name: session.name,
|
name: session.name,
|
||||||
@@ -188,7 +188,7 @@ export function registerSessionRoutes(
|
|||||||
// (notably the SettingsPane open in another tab) can refetch and pick
|
// (notably the SettingsPane open in another tab) can refetch and pick
|
||||||
// up the new fields. Frame stays lean (decision d) — payload is just
|
// up the new fields. Frame stays lean (decision d) — payload is just
|
||||||
// ids + name + updated_at, the client refetches via api.sessions.get.
|
// ids + name + updated_at, the client refetches via api.sessions.get.
|
||||||
broker.publishUser('default', {
|
broker.publishUserFrame('default', {
|
||||||
type: 'session_updated',
|
type: 'session_updated',
|
||||||
session_id: session.id,
|
session_id: session.id,
|
||||||
project_id: session.project_id,
|
project_id: session.project_id,
|
||||||
@@ -220,7 +220,7 @@ export function registerSessionRoutes(
|
|||||||
return { error: 'session not found' };
|
return { error: 'session not found' };
|
||||||
}
|
}
|
||||||
const session = rows[0]!;
|
const session = rows[0]!;
|
||||||
broker.publishUser('default', {
|
broker.publishUserFrame('default', {
|
||||||
type: 'session_workspace_updated',
|
type: 'session_workspace_updated',
|
||||||
session_id: session.id,
|
session_id: session.id,
|
||||||
workspace_panes: session.workspace_panes,
|
workspace_panes: session.workspace_panes,
|
||||||
@@ -248,7 +248,7 @@ export function registerSessionRoutes(
|
|||||||
`;
|
`;
|
||||||
const ids = rows.map((r) => r.id);
|
const ids = rows.map((r) => r.id);
|
||||||
for (const id of ids) {
|
for (const id of ids) {
|
||||||
broker.publishUser('default', {
|
broker.publishUserFrame('default', {
|
||||||
type: 'session_archived',
|
type: 'session_archived',
|
||||||
session_id: id,
|
session_id: id,
|
||||||
project_id: req.params.id,
|
project_id: req.params.id,
|
||||||
@@ -289,7 +289,7 @@ export function registerSessionRoutes(
|
|||||||
reply.code(404);
|
reply.code(404);
|
||||||
return { error: 'session not found or already archived' };
|
return { error: 'session not found or already archived' };
|
||||||
}
|
}
|
||||||
broker.publishUser('default', {
|
broker.publishUserFrame('default', {
|
||||||
type: 'session_archived',
|
type: 'session_archived',
|
||||||
session_id: rows[0]!.id,
|
session_id: rows[0]!.id,
|
||||||
project_id: rows[0]!.project_id,
|
project_id: rows[0]!.project_id,
|
||||||
@@ -312,7 +312,7 @@ export function registerSessionRoutes(
|
|||||||
return { error: 'session not found or not archived' };
|
return { error: 'session not found or not archived' };
|
||||||
}
|
}
|
||||||
const session = rows[0]!;
|
const session = rows[0]!;
|
||||||
broker.publishUser('default', {
|
broker.publishUserFrame('default', {
|
||||||
type: 'session_created',
|
type: 'session_created',
|
||||||
session: session,
|
session: session,
|
||||||
project_id: session.project_id,
|
project_id: session.project_id,
|
||||||
@@ -334,7 +334,7 @@ export function registerSessionRoutes(
|
|||||||
return { error: 'not found' };
|
return { error: 'not found' };
|
||||||
}
|
}
|
||||||
const project_id = deleted[0]!.project_id;
|
const project_id = deleted[0]!.project_id;
|
||||||
broker.publishUser('default', { type: 'session_deleted', session_id: id, project_id });
|
broker.publishUserFrame('default', { type: 'session_deleted', session_id: id, project_id });
|
||||||
reply.code(204);
|
reply.code(204);
|
||||||
return null;
|
return null;
|
||||||
}
|
}
|
||||||
|
|||||||
40
apps/server/src/routes/tools.ts
Normal file
40
apps/server/src/routes/tools.ts
Normal file
@@ -0,0 +1,40 @@
|
|||||||
|
import type { FastifyInstance } from 'fastify';
|
||||||
|
import type { Sql } from '../db.js';
|
||||||
|
|
||||||
|
export interface ToolCostStat {
|
||||||
|
tool_name: string;
|
||||||
|
mean_prompt_tokens: number;
|
||||||
|
mean_completion_tokens: number;
|
||||||
|
n_calls: number;
|
||||||
|
updated_at: string;
|
||||||
|
}
|
||||||
|
|
||||||
|
// v1.13.10: per-tool token cost rolling window read endpoint. Backed by the
|
||||||
|
// tool_cost_stats view in schema.sql (last 100 calls per tool, equal-split
|
||||||
|
// attribution across multi-tool turns, sentinel/failed-turn excluded).
|
||||||
|
// Consumed by AgentPicker for at-a-glance per-agent cost hints.
|
||||||
|
export function registerToolsRoutes(app: FastifyInstance, sql: Sql): void {
|
||||||
|
app.get('/api/tools/cost_stats', async () => {
|
||||||
|
const rows = await sql<
|
||||||
|
{
|
||||||
|
tool_name: string;
|
||||||
|
prompt_tokens_sum: number;
|
||||||
|
completion_tokens_sum: number;
|
||||||
|
n_calls: number;
|
||||||
|
updated_at: string;
|
||||||
|
}[]
|
||||||
|
>`
|
||||||
|
SELECT tool_name, prompt_tokens_sum, completion_tokens_sum, n_calls, updated_at
|
||||||
|
FROM tool_cost_stats
|
||||||
|
ORDER BY tool_name ASC
|
||||||
|
`;
|
||||||
|
const stats: ToolCostStat[] = rows.map((r) => ({
|
||||||
|
tool_name: r.tool_name,
|
||||||
|
mean_prompt_tokens: Math.round(r.prompt_tokens_sum / r.n_calls),
|
||||||
|
mean_completion_tokens: Math.round(r.completion_tokens_sum / r.n_calls),
|
||||||
|
n_calls: r.n_calls,
|
||||||
|
updated_at: r.updated_at,
|
||||||
|
}));
|
||||||
|
return { stats };
|
||||||
|
});
|
||||||
|
}
|
||||||
@@ -119,6 +119,68 @@ SELECT
|
|||||||
WHERE p.message_id = m.id AND p.kind = 'reasoning' AND p.hidden_at IS NULL) AS reasoning_parts
|
WHERE p.message_id = m.id AND p.kind = 'reasoning' AND p.hidden_at IS NULL) AS reasoning_parts
|
||||||
FROM messages m;
|
FROM messages m;
|
||||||
|
|
||||||
|
-- v1.13.10: per-tool token cost rolling window. Derives from
|
||||||
|
-- messages_with_parts (the v1.13.1-B view that COALESCEs message_parts over
|
||||||
|
-- the legacy JSON column) so this works whether the chat predates v1.13.0
|
||||||
|
-- or postdates v1.13.2 (column drop). No new write site — all source data
|
||||||
|
-- already lands via the existing tool-phase.ts:94-95 UPDATE.
|
||||||
|
--
|
||||||
|
-- Attribution model: equal split. A turn emitting N tool calls divides its
|
||||||
|
-- prompt/completion tokens by N before attribution. See v1.13.10 dispatch
|
||||||
|
-- brief for rationale + rejected alternatives.
|
||||||
|
--
|
||||||
|
-- Column mapping: messages.ctx_used = prompt (input), messages.tokens_used
|
||||||
|
-- = completion (output). Non-obvious naming; pinned via canonical writes at
|
||||||
|
-- tool-phase.ts:94-95 et al.
|
||||||
|
--
|
||||||
|
-- Filtering rationale:
|
||||||
|
-- status='complete' — exclude failed/cancelled (defense in
|
||||||
|
-- depth; failed-path doesn't write
|
||||||
|
-- tokens_used so they're filtered
|
||||||
|
-- indirectly too).
|
||||||
|
-- metadata->>'kind' exclusions — exclude cap_hit / doom_loop sentinels
|
||||||
|
-- (defense in depth; sentinels are
|
||||||
|
-- role='system' with tool_calls=NULL
|
||||||
|
-- so they're filtered indirectly too).
|
||||||
|
-- experimental_repairToolCall — no special handling; retries flow
|
||||||
|
-- as normal next-turn tool_result
|
||||||
|
-- errors and count naturally.
|
||||||
|
--
|
||||||
|
-- Rolling window: last 100 calls per tool_name, ordered by created_at DESC.
|
||||||
|
-- Aggregate-on-read is microseconds at BooCode scale (single user, ~30
|
||||||
|
-- tools, < 100 calls each). DROP VIEW + recreate to change window size.
|
||||||
|
CREATE OR REPLACE VIEW tool_cost_stats AS
|
||||||
|
WITH per_call AS (
|
||||||
|
SELECT
|
||||||
|
(tc->>'name')::text AS tool_name,
|
||||||
|
(m.ctx_used::float / NULLIF(jsonb_array_length(m.tool_calls), 0)) AS prompt_tokens,
|
||||||
|
(m.tokens_used::float / NULLIF(jsonb_array_length(m.tool_calls), 0)) AS completion_tokens,
|
||||||
|
m.created_at,
|
||||||
|
ROW_NUMBER() OVER (
|
||||||
|
PARTITION BY (tc->>'name')::text
|
||||||
|
ORDER BY m.created_at DESC
|
||||||
|
) AS rn
|
||||||
|
FROM messages_with_parts m,
|
||||||
|
LATERAL jsonb_array_elements(m.tool_calls) AS tc
|
||||||
|
WHERE m.tool_calls IS NOT NULL
|
||||||
|
AND jsonb_array_length(m.tool_calls) > 0
|
||||||
|
AND m.tokens_used IS NOT NULL
|
||||||
|
AND m.ctx_used IS NOT NULL
|
||||||
|
AND m.status = 'complete'
|
||||||
|
AND (m.metadata IS NULL
|
||||||
|
OR m.metadata->>'kind' IS NULL
|
||||||
|
OR m.metadata->>'kind' NOT IN ('cap_hit', 'doom_loop'))
|
||||||
|
)
|
||||||
|
SELECT
|
||||||
|
tool_name,
|
||||||
|
ROUND(SUM(prompt_tokens))::int AS prompt_tokens_sum,
|
||||||
|
ROUND(SUM(completion_tokens))::int AS completion_tokens_sum,
|
||||||
|
COUNT(*)::int AS n_calls,
|
||||||
|
MAX(created_at) AS updated_at
|
||||||
|
FROM per_call
|
||||||
|
WHERE rn <= 100
|
||||||
|
GROUP BY tool_name;
|
||||||
|
|
||||||
ALTER TABLE messages ADD COLUMN IF NOT EXISTS tokens_used INTEGER;
|
ALTER TABLE messages ADD COLUMN IF NOT EXISTS tokens_used INTEGER;
|
||||||
ALTER TABLE messages ADD COLUMN IF NOT EXISTS ctx_used INTEGER;
|
ALTER TABLE messages ADD COLUMN IF NOT EXISTS ctx_used INTEGER;
|
||||||
ALTER TABLE messages ADD COLUMN IF NOT EXISTS ctx_max INTEGER;
|
ALTER TABLE messages ADD COLUMN IF NOT EXISTS ctx_max INTEGER;
|
||||||
|
|||||||
@@ -41,49 +41,58 @@ function mkMsg(
|
|||||||
|
|
||||||
// ---- usable -----------------------------------------------------------------
|
// ---- usable -----------------------------------------------------------------
|
||||||
|
|
||||||
describe('usable', () => {
|
// v1.13.9: ratio-only early trigger at 0.85 × contextLimit. Replaces the
|
||||||
it('returns 0 when contextLimit is 0', () => {
|
// v1.11.0-era `contextLimit - 20_000` math, which degenerated to 0 for
|
||||||
|
// contexts ≤20k and gave only 7-8% headroom at 262k.
|
||||||
|
describe('usable() — ratio-only early trigger (v1.13.9)', () => {
|
||||||
|
it('returns floor(0.85 * limit) for the qwen3.6 daily-driver context', () => {
|
||||||
|
// floor(0.85 * 262144) = floor(222822.4) = 222822 — 15% headroom for
|
||||||
|
// the summarizer to do its turn without itself overflowing.
|
||||||
|
expect(usable(262144)).toBe(222822);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('returns 0.85× for a mid-sized context', () => {
|
||||||
|
expect(usable(100_000)).toBe(85_000);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('returns 0.85× for a small context (no degenerate 0)', () => {
|
||||||
|
// floor(0.85 * 8192) = 6963. Under the old formula this returned 0
|
||||||
|
// (8192 - 20_000 clamped to 0), effectively disabling compaction for
|
||||||
|
// small-context models. The ratio keeps the trigger active.
|
||||||
|
expect(usable(8192)).toBe(6963);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('returns 0 for zero or negative contextLimit', () => {
|
||||||
expect(usable(0)).toBe(0);
|
expect(usable(0)).toBe(0);
|
||||||
});
|
expect(usable(-1)).toBe(0);
|
||||||
|
|
||||||
it('returns 0 when contextLimit is below the 20k buffer', () => {
|
|
||||||
// Math.max(0, x - 20000) clamps the subtraction so we never report
|
|
||||||
// negative headroom. A 10k-context model reports 0 usable, which makes
|
|
||||||
// isOverflow short-circuit to false (correct — we can't size the
|
|
||||||
// compaction with no headroom).
|
|
||||||
expect(usable(10_000)).toBe(0);
|
|
||||||
expect(usable(19_999)).toBe(0);
|
|
||||||
expect(usable(20_000)).toBe(0);
|
|
||||||
});
|
|
||||||
|
|
||||||
it('subtracts the 20k buffer from a normal-sized context window', () => {
|
|
||||||
expect(usable(100_000)).toBe(80_000);
|
|
||||||
expect(usable(32_768)).toBe(12_768);
|
|
||||||
});
|
});
|
||||||
});
|
});
|
||||||
|
|
||||||
// ---- isOverflow -------------------------------------------------------------
|
// ---- isOverflow -------------------------------------------------------------
|
||||||
|
|
||||||
describe('isOverflow', () => {
|
describe('isOverflow', () => {
|
||||||
it('returns false when usable is 0 (unknown / sub-buffer context)', () => {
|
it('returns false when usable is 0 (unknown contextLimit)', () => {
|
||||||
expect(isOverflow({ prompt_tokens: 999_999, completion_tokens: 0 }, 0)).toBe(false);
|
expect(isOverflow({ prompt_tokens: 999_999, completion_tokens: 0 }, 0)).toBe(false);
|
||||||
expect(isOverflow({ prompt_tokens: 0, completion_tokens: 999_999 }, 10_000)).toBe(false);
|
expect(isOverflow({ prompt_tokens: 0, completion_tokens: 999_999 }, -1)).toBe(false);
|
||||||
});
|
});
|
||||||
|
|
||||||
it('returns false at 50% of usable', () => {
|
it('returns false at 50% of usable', () => {
|
||||||
// usable(100k) = 80k → 50% = 40k.
|
// v1.13.9: usable(100k) = 85k → 50% ≈ 42.5k.
|
||||||
expect(isOverflow({ prompt_tokens: 30_000, completion_tokens: 10_000 }, 100_000)).toBe(false);
|
expect(isOverflow({ prompt_tokens: 30_000, completion_tokens: 10_000 }, 100_000)).toBe(false);
|
||||||
});
|
});
|
||||||
|
|
||||||
it('returns false just under usable', () => {
|
it('returns false just under usable', () => {
|
||||||
expect(isOverflow({ prompt_tokens: 79_000, completion_tokens: 999 }, 100_000)).toBe(false);
|
// v1.13.9: 84_000 + 999 = 84_999 < 85_000 budget.
|
||||||
|
expect(isOverflow({ prompt_tokens: 84_000, completion_tokens: 999 }, 100_000)).toBe(false);
|
||||||
});
|
});
|
||||||
|
|
||||||
it('returns true exactly at usable (>=, not strict >)', () => {
|
it('returns true exactly at usable (>=, not strict >)', () => {
|
||||||
expect(isOverflow({ prompt_tokens: 80_000, completion_tokens: 0 }, 100_000)).toBe(true);
|
// v1.13.9: 85_000 == usable(100_000).
|
||||||
|
expect(isOverflow({ prompt_tokens: 85_000, completion_tokens: 0 }, 100_000)).toBe(true);
|
||||||
});
|
});
|
||||||
|
|
||||||
it('returns true above usable', () => {
|
it('returns true above usable', () => {
|
||||||
|
// 50_000 + 40_000 = 90_000 > 85_000.
|
||||||
expect(isOverflow({ prompt_tokens: 50_000, completion_tokens: 40_000 }, 100_000)).toBe(true);
|
expect(isOverflow({ prompt_tokens: 50_000, completion_tokens: 40_000 }, 100_000)).toBe(true);
|
||||||
});
|
});
|
||||||
});
|
});
|
||||||
@@ -226,8 +235,9 @@ describe('select', () => {
|
|||||||
const u = mkMsg('user', 'oversized');
|
const u = mkMsg('user', 'oversized');
|
||||||
const a = mkMsg('assistant', 'Y'.repeat(40_000));
|
const a = mkMsg('assistant', 'Y'.repeat(40_000));
|
||||||
const result = select([u, a], 30_000, 1);
|
const result = select([u, a], 30_000, 1);
|
||||||
// usable(30k) = 10k → budget = min(8k, max(2k, floor(10k*0.25))) =
|
// v1.13.9: usable(30k) = floor(0.85*30k) = 25500 → budget =
|
||||||
// min(8k, max(2k, 2500)) = 2500. 40k chars ≈ 10k tokens. Can't fit.
|
// min(8k, max(2k, floor(25500*0.25))) = min(8k, max(2k, 6375)) = 6375.
|
||||||
|
// 40k chars ≈ 10k tokens. Still can't fit (10k > 6375).
|
||||||
expect(result.tail_start_id).toBeUndefined();
|
expect(result.tail_start_id).toBeUndefined();
|
||||||
expect(result.head).toEqual([u, a]);
|
expect(result.head).toEqual([u, a]);
|
||||||
});
|
});
|
||||||
|
|||||||
228
apps/server/src/services/__tests__/tool_cost_stats.test.ts
Normal file
228
apps/server/src/services/__tests__/tool_cost_stats.test.ts
Normal file
@@ -0,0 +1,228 @@
|
|||||||
|
import { describe, it, expect, beforeAll, afterAll } from 'vitest';
|
||||||
|
import postgres from 'postgres';
|
||||||
|
import { readFileSync } from 'node:fs';
|
||||||
|
import { resolve } from 'node:path';
|
||||||
|
import { fileURLToPath } from 'node:url';
|
||||||
|
|
||||||
|
// v1.13.10: integration tests for the tool_cost_stats view. Skipped unless
|
||||||
|
// DATABASE_URL is set so they don't break `pnpm test` on a fresh checkout.
|
||||||
|
// Run with:
|
||||||
|
// DATABASE_URL=postgres://boocode:<pw>@localhost:5500/boocode pnpm -C apps/server test
|
||||||
|
//
|
||||||
|
// Isolation: each test uses a unique tool_name suffix derived from a per-test
|
||||||
|
// counter. The view aggregates globally across all chats, so without unique
|
||||||
|
// tool names parallel test runs would interfere. Cleanup deletes by tool_name
|
||||||
|
// suffix in afterAll.
|
||||||
|
|
||||||
|
const DB_URL = process.env.DATABASE_URL;
|
||||||
|
const describeFn = DB_URL ? describe : describe.skip;
|
||||||
|
|
||||||
|
const TEST_RUN_ID = `v13_10_${Date.now()}`;
|
||||||
|
const tname = (suffix: string) => `${TEST_RUN_ID}_${suffix}`;
|
||||||
|
|
||||||
|
describeFn('tool_cost_stats view (v1.13.10)', () => {
|
||||||
|
let sql: ReturnType<typeof postgres>;
|
||||||
|
let projectId: string;
|
||||||
|
let sessionId: string;
|
||||||
|
let chatId: string;
|
||||||
|
|
||||||
|
beforeAll(async () => {
|
||||||
|
if (!DB_URL) return;
|
||||||
|
sql = postgres(DB_URL, { max: 2, idle_timeout: 5, connect_timeout: 5, onnotice: () => {} });
|
||||||
|
|
||||||
|
// Apply the schema before fixtures so the view exists. Idempotent via
|
||||||
|
// CREATE OR REPLACE VIEW + CREATE TABLE IF NOT EXISTS; safe to run on a
|
||||||
|
// pre-populated DB. Mirrors apps/server/src/db.ts:applySchema.
|
||||||
|
const here = fileURLToPath(import.meta.url);
|
||||||
|
const schemaPath = resolve(here, '../../../schema.sql');
|
||||||
|
const ddl = readFileSync(schemaPath, 'utf8');
|
||||||
|
await sql.unsafe(ddl);
|
||||||
|
|
||||||
|
// Fixture project + session + chat for all inserts in this file.
|
||||||
|
const proj = await sql<{ id: string }[]>`
|
||||||
|
INSERT INTO projects (name, path)
|
||||||
|
VALUES (${`tool_cost_stats_test_${TEST_RUN_ID}`}, ${`/tmp/${TEST_RUN_ID}`})
|
||||||
|
RETURNING id
|
||||||
|
`;
|
||||||
|
projectId = proj[0]!.id;
|
||||||
|
const sess = await sql<{ id: string }[]>`
|
||||||
|
INSERT INTO sessions (project_id, name, model)
|
||||||
|
VALUES (${projectId}, ${'test'}, ${'test-model'})
|
||||||
|
RETURNING id
|
||||||
|
`;
|
||||||
|
sessionId = sess[0]!.id;
|
||||||
|
const chat = await sql<{ id: string }[]>`
|
||||||
|
INSERT INTO chats (session_id, name) VALUES (${sessionId}, ${'test'}) RETURNING id
|
||||||
|
`;
|
||||||
|
chatId = chat[0]!.id;
|
||||||
|
});
|
||||||
|
|
||||||
|
afterAll(async () => {
|
||||||
|
if (!DB_URL) return;
|
||||||
|
// Project FK CASCADE cleans sessions/chats/messages/parts in one shot.
|
||||||
|
await sql`DELETE FROM projects WHERE id = ${projectId}`;
|
||||||
|
await sql.end({ timeout: 5 });
|
||||||
|
});
|
||||||
|
|
||||||
|
async function insertAssistantTurn(opts: {
|
||||||
|
toolNames: string[];
|
||||||
|
tokensUsed: number | null;
|
||||||
|
ctxUsed: number | null;
|
||||||
|
status?: 'streaming' | 'complete' | 'failed' | 'cancelled';
|
||||||
|
metadata?: { kind: string } | null;
|
||||||
|
createdAt?: Date;
|
||||||
|
}): Promise<string> {
|
||||||
|
const toolCalls = opts.toolNames.map((name, i) => ({
|
||||||
|
id: `call_${TEST_RUN_ID}_${name}_${i}`,
|
||||||
|
name,
|
||||||
|
args: {},
|
||||||
|
}));
|
||||||
|
const created = opts.createdAt ?? new Date();
|
||||||
|
const rows = await sql<{ id: string }[]>`
|
||||||
|
INSERT INTO messages (
|
||||||
|
session_id, chat_id, role, content, kind, status,
|
||||||
|
tool_calls, tokens_used, ctx_used,
|
||||||
|
metadata, created_at
|
||||||
|
)
|
||||||
|
VALUES (
|
||||||
|
${sessionId}, ${chatId}, 'assistant', '', 'message',
|
||||||
|
${opts.status ?? 'complete'},
|
||||||
|
${sql.json(toolCalls as never)},
|
||||||
|
${opts.tokensUsed},
|
||||||
|
${opts.ctxUsed},
|
||||||
|
${opts.metadata ? sql.json(opts.metadata as never) : null},
|
||||||
|
${created}
|
||||||
|
)
|
||||||
|
RETURNING id
|
||||||
|
`;
|
||||||
|
return rows[0]!.id;
|
||||||
|
}
|
||||||
|
|
||||||
|
it('returns empty when no tool calls exist for a tool name', async () => {
|
||||||
|
const t = tname('absent');
|
||||||
|
const stats = await sql<{ tool_name: string }[]>`
|
||||||
|
SELECT * FROM tool_cost_stats WHERE tool_name = ${t}
|
||||||
|
`;
|
||||||
|
expect(stats).toEqual([]);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('attributes single-tool turn fully to that tool', async () => {
|
||||||
|
const t = tname('single');
|
||||||
|
await insertAssistantTurn({ toolNames: [t], tokensUsed: 300, ctxUsed: 15000 });
|
||||||
|
const stats = await sql<{
|
||||||
|
tool_name: string;
|
||||||
|
prompt_tokens_sum: number;
|
||||||
|
completion_tokens_sum: number;
|
||||||
|
n_calls: number;
|
||||||
|
}[]>`SELECT * FROM tool_cost_stats WHERE tool_name = ${t}`;
|
||||||
|
expect(stats[0]).toMatchObject({
|
||||||
|
tool_name: t,
|
||||||
|
prompt_tokens_sum: 15000,
|
||||||
|
completion_tokens_sum: 300,
|
||||||
|
n_calls: 1,
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
it('splits multi-tool turn equally across tools', async () => {
|
||||||
|
const a = tname('multi_a');
|
||||||
|
const b = tname('multi_b');
|
||||||
|
const c = tname('multi_c');
|
||||||
|
// 3 tools, 300 completion / 15000 prompt → each gets 100 / 5000
|
||||||
|
await insertAssistantTurn({ toolNames: [a, b, c], tokensUsed: 300, ctxUsed: 15000 });
|
||||||
|
const stats = await sql<{
|
||||||
|
tool_name: string;
|
||||||
|
prompt_tokens_sum: number;
|
||||||
|
completion_tokens_sum: number;
|
||||||
|
n_calls: number;
|
||||||
|
}[]>`
|
||||||
|
SELECT * FROM tool_cost_stats
|
||||||
|
WHERE tool_name IN (${a}, ${b}, ${c})
|
||||||
|
ORDER BY tool_name
|
||||||
|
`;
|
||||||
|
expect(stats).toHaveLength(3);
|
||||||
|
for (const s of stats) {
|
||||||
|
expect(s.completion_tokens_sum).toBe(100);
|
||||||
|
expect(s.prompt_tokens_sum).toBe(5000);
|
||||||
|
expect(s.n_calls).toBe(1);
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
it('limits to last 100 calls per tool (FIFO window)', async () => {
|
||||||
|
const t = tname('window');
|
||||||
|
// Insert 110 turns with monotonically-increasing created_at and tokensUsed.
|
||||||
|
// Expect view to keep only the most recent 100.
|
||||||
|
const base = Date.now() + 1_000_000; // distant future to avoid colliding with other tests
|
||||||
|
for (let i = 1; i <= 110; i++) {
|
||||||
|
await insertAssistantTurn({
|
||||||
|
toolNames: [t],
|
||||||
|
tokensUsed: i, // 1..110
|
||||||
|
ctxUsed: i * 10,
|
||||||
|
createdAt: new Date(base + i),
|
||||||
|
});
|
||||||
|
}
|
||||||
|
const [stat] = await sql<{
|
||||||
|
n_calls: number;
|
||||||
|
completion_tokens_sum: number;
|
||||||
|
}[]>`SELECT n_calls, completion_tokens_sum FROM tool_cost_stats WHERE tool_name = ${t}`;
|
||||||
|
expect(stat!.n_calls).toBe(100);
|
||||||
|
// Last 100 are tokensUsed=11..110, sum = (11+110)*100/2 = 6050.
|
||||||
|
expect(stat!.completion_tokens_sum).toBe(6050);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('excludes turns with NULL tokens_used (pre-v1.13.7 latent regression)', async () => {
|
||||||
|
const t = tname('null_tokens');
|
||||||
|
await insertAssistantTurn({ toolNames: [t], tokensUsed: null, ctxUsed: 1000 });
|
||||||
|
await insertAssistantTurn({ toolNames: [t], tokensUsed: 100, ctxUsed: null });
|
||||||
|
const stats = await sql`SELECT * FROM tool_cost_stats WHERE tool_name = ${t}`;
|
||||||
|
expect(stats).toEqual([]);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('excludes failed/cancelled turns and cap_hit/doom_loop sentinel rows', async () => {
|
||||||
|
const t = tname('filtered');
|
||||||
|
// A: status='failed' — excluded
|
||||||
|
// B: status='cancelled' — excluded
|
||||||
|
// C: status='complete', metadata={kind:'cap_hit'} — excluded
|
||||||
|
// D: status='complete', metadata={kind:'doom_loop'} — excluded
|
||||||
|
// E: status='complete', metadata=null — included
|
||||||
|
await insertAssistantTurn({ toolNames: [t], tokensUsed: 100, ctxUsed: 1000, status: 'failed' });
|
||||||
|
await insertAssistantTurn({ toolNames: [t], tokensUsed: 100, ctxUsed: 1000, status: 'cancelled' });
|
||||||
|
await insertAssistantTurn({ toolNames: [t], tokensUsed: 100, ctxUsed: 1000, metadata: { kind: 'cap_hit' } });
|
||||||
|
await insertAssistantTurn({ toolNames: [t], tokensUsed: 100, ctxUsed: 1000, metadata: { kind: 'doom_loop' } });
|
||||||
|
await insertAssistantTurn({ toolNames: [t], tokensUsed: 100, ctxUsed: 1000, metadata: null });
|
||||||
|
const [stat] = await sql<{ n_calls: number }[]>`
|
||||||
|
SELECT n_calls FROM tool_cost_stats WHERE tool_name = ${t}
|
||||||
|
`;
|
||||||
|
expect(stat!.n_calls).toBe(1);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('reads tool_calls via messages_with_parts (parts-authoritative)', async () => {
|
||||||
|
const t = tname('parts');
|
||||||
|
// Insert an assistant row with messages.tool_calls=NULL but a
|
||||||
|
// message_parts row carrying the tool_call. The view reads via
|
||||||
|
// messages_with_parts, which COALESCEs the parts table over the legacy
|
||||||
|
// column — so this row should still aggregate.
|
||||||
|
const rows = await sql<{ id: string }[]>`
|
||||||
|
INSERT INTO messages (
|
||||||
|
session_id, chat_id, role, content, kind, status,
|
||||||
|
tool_calls, tokens_used, ctx_used
|
||||||
|
)
|
||||||
|
VALUES (
|
||||||
|
${sessionId}, ${chatId}, 'assistant', '', 'message', 'complete',
|
||||||
|
NULL, 200, 5000
|
||||||
|
)
|
||||||
|
RETURNING id
|
||||||
|
`;
|
||||||
|
const messageId = rows[0]!.id;
|
||||||
|
await sql`
|
||||||
|
INSERT INTO message_parts (message_id, sequence, kind, payload)
|
||||||
|
VALUES (
|
||||||
|
${messageId}, 0, 'tool_call',
|
||||||
|
${sql.json({ id: `tc_parts_${TEST_RUN_ID}`, name: t, args: {} } as never)}
|
||||||
|
)
|
||||||
|
`;
|
||||||
|
const [stat] = await sql<{ n_calls: number }[]>`
|
||||||
|
SELECT n_calls FROM tool_cost_stats WHERE tool_name = ${t}
|
||||||
|
`;
|
||||||
|
expect(stat!.n_calls).toBe(1);
|
||||||
|
});
|
||||||
|
});
|
||||||
@@ -1,5 +1,11 @@
|
|||||||
import { describe, it, expect } from 'vitest';
|
import { describe, it, expect } from 'vitest';
|
||||||
import { ALL_TOOLS } from '../tools.js';
|
import {
|
||||||
|
ALL_TOOLS,
|
||||||
|
CORE_TOOL_NAMES,
|
||||||
|
STANDARD_TOOL_NAMES,
|
||||||
|
TOOLS_BY_NAME,
|
||||||
|
resolveToolTier,
|
||||||
|
} from '../tools.js';
|
||||||
|
|
||||||
describe('ALL_TOOLS registry', () => {
|
describe('ALL_TOOLS registry', () => {
|
||||||
// v1.13.3: tools must be alpha-sorted at module load. llama.cpp's prompt
|
// v1.13.3: tools must be alpha-sorted at module load. llama.cpp's prompt
|
||||||
@@ -12,3 +18,59 @@ describe('ALL_TOOLS registry', () => {
|
|||||||
expect(names).toEqual([...names].sort((a, b) => a.localeCompare(b)));
|
expect(names).toEqual([...names].sort((a, b) => a.localeCompare(b)));
|
||||||
});
|
});
|
||||||
});
|
});
|
||||||
|
|
||||||
|
describe('resolveToolTier (v1.13.15-tools)', () => {
|
||||||
|
it('returns CORE tools for tier=core', () => {
|
||||||
|
expect(resolveToolTier('core')).toEqual(CORE_TOOL_NAMES);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('returns STANDARD tools for tier=standard', () => {
|
||||||
|
const result = resolveToolTier('standard');
|
||||||
|
expect(result.length).toBe(STANDARD_TOOL_NAMES.length);
|
||||||
|
expect(result.length).toBeGreaterThan(CORE_TOOL_NAMES.length);
|
||||||
|
// STANDARD is a strict superset of CORE.
|
||||||
|
expect(result).toEqual(expect.arrayContaining([...CORE_TOOL_NAMES]));
|
||||||
|
});
|
||||||
|
|
||||||
|
it('returns ALL tool names for tier=all', () => {
|
||||||
|
expect(resolveToolTier('all').length).toBe(ALL_TOOLS.length);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('defaults to all when env var is undefined', () => {
|
||||||
|
expect(resolveToolTier(undefined).length).toBe(ALL_TOOLS.length);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('is case-insensitive', () => {
|
||||||
|
expect(resolveToolTier('CORE')).toEqual(CORE_TOOL_NAMES);
|
||||||
|
expect(resolveToolTier('Standard').length).toBe(STANDARD_TOOL_NAMES.length);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('falls back to all for unknown tier strings', () => {
|
||||||
|
expect(resolveToolTier('bogus').length).toBe(ALL_TOOLS.length);
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
describe('CORE_TOOL_NAMES + STANDARD_TOOL_NAMES validation', () => {
|
||||||
|
// The module-load validation in tools.ts throws if a tier references a
|
||||||
|
// tool that doesn't exist in TOOLS_BY_NAME. These tests double-check that
|
||||||
|
// invariant from the consumer side so a future tier-list edit can't smuggle
|
||||||
|
// in a typo without a test failure.
|
||||||
|
it('every CORE name exists in TOOLS_BY_NAME', () => {
|
||||||
|
for (const name of CORE_TOOL_NAMES) {
|
||||||
|
expect(TOOLS_BY_NAME[name], `CORE references unknown tool '${name}'`).toBeDefined();
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
it('every STANDARD name exists in TOOLS_BY_NAME', () => {
|
||||||
|
for (const name of STANDARD_TOOL_NAMES) {
|
||||||
|
expect(TOOLS_BY_NAME[name], `STANDARD references unknown tool '${name}'`).toBeDefined();
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
it('CORE is a subset of STANDARD', () => {
|
||||||
|
const standardSet = new Set<string>(STANDARD_TOOL_NAMES);
|
||||||
|
for (const name of CORE_TOOL_NAMES) {
|
||||||
|
expect(standardSet.has(name), `'${name}' is in CORE but not STANDARD`).toBe(true);
|
||||||
|
}
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|||||||
218
apps/server/src/services/__tests__/ws-frames.test.ts
Normal file
218
apps/server/src/services/__tests__/ws-frames.test.ts
Normal file
@@ -0,0 +1,218 @@
|
|||||||
|
import { describe, it, expect, vi, beforeEach, afterEach } from 'vitest';
|
||||||
|
import { readFileSync } from 'node:fs';
|
||||||
|
import { resolve } from 'node:path';
|
||||||
|
import { fileURLToPath } from 'node:url';
|
||||||
|
import {
|
||||||
|
WsFrameSchema,
|
||||||
|
KNOWN_FRAME_TYPES,
|
||||||
|
type WsFrame,
|
||||||
|
} from '../../types/ws-frames.js';
|
||||||
|
import { createBroker } from '../broker.js';
|
||||||
|
|
||||||
|
const VALID_UUID_A = '00000000-0000-0000-0000-000000000001';
|
||||||
|
const VALID_UUID_B = '00000000-0000-0000-0000-000000000002';
|
||||||
|
const VALID_UUID_C = '00000000-0000-0000-0000-000000000003';
|
||||||
|
const VALID_TIMESTAMP = '2026-05-22T14:30:00.000Z';
|
||||||
|
|
||||||
|
describe('WsFrameSchema (v1.13.11-a)', () => {
|
||||||
|
it('accepts a well-formed chat_status frame', () => {
|
||||||
|
const result = WsFrameSchema.safeParse({
|
||||||
|
type: 'chat_status',
|
||||||
|
chat_id: VALID_UUID_A,
|
||||||
|
status: 'streaming',
|
||||||
|
at: VALID_TIMESTAMP,
|
||||||
|
});
|
||||||
|
expect(result.success).toBe(true);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('rejects an unknown frame type', () => {
|
||||||
|
const result = WsFrameSchema.safeParse({
|
||||||
|
type: 'cosmic_ray_strike',
|
||||||
|
chat_id: VALID_UUID_A,
|
||||||
|
});
|
||||||
|
expect(result.success).toBe(false);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('rejects a chat_status frame with invalid status enum', () => {
|
||||||
|
// v1.12.1 dropped the legacy 'working' status. Any frame still emitting it
|
||||||
|
// should fail validation — that's a drift catcher.
|
||||||
|
const result = WsFrameSchema.safeParse({
|
||||||
|
type: 'chat_status',
|
||||||
|
chat_id: VALID_UUID_A,
|
||||||
|
status: 'working',
|
||||||
|
at: VALID_TIMESTAMP,
|
||||||
|
});
|
||||||
|
expect(result.success).toBe(false);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('rejects a UUID field with a non-UUID string', () => {
|
||||||
|
const result = WsFrameSchema.safeParse({
|
||||||
|
type: 'chat_status',
|
||||||
|
chat_id: 'not-a-uuid',
|
||||||
|
status: 'idle',
|
||||||
|
at: VALID_TIMESTAMP,
|
||||||
|
});
|
||||||
|
expect(result.success).toBe(false);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('rejects negative token counts in usage frame', () => {
|
||||||
|
const result = WsFrameSchema.safeParse({
|
||||||
|
type: 'usage',
|
||||||
|
message_id: VALID_UUID_A,
|
||||||
|
chat_id: VALID_UUID_B,
|
||||||
|
completion_tokens: -1,
|
||||||
|
ctx_used: 100,
|
||||||
|
ctx_max: 1000,
|
||||||
|
});
|
||||||
|
expect(result.success).toBe(false);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('accepts a usage frame with nullable token counts (pre-v1.13.7 history)', () => {
|
||||||
|
const result = WsFrameSchema.safeParse({
|
||||||
|
type: 'usage',
|
||||||
|
message_id: VALID_UUID_A,
|
||||||
|
chat_id: VALID_UUID_B,
|
||||||
|
completion_tokens: null,
|
||||||
|
ctx_used: null,
|
||||||
|
ctx_max: null,
|
||||||
|
});
|
||||||
|
expect(result.success).toBe(true);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('accepts a tool_result frame with non-UUID tool_call_id (model-emitted)', () => {
|
||||||
|
// Model-emitted tool_call_ids look like "call_abc123", not UUIDs.
|
||||||
|
const result = WsFrameSchema.safeParse({
|
||||||
|
type: 'tool_result',
|
||||||
|
tool_message_id: VALID_UUID_A,
|
||||||
|
chat_id: VALID_UUID_B,
|
||||||
|
tool_call_id: 'call_abc123',
|
||||||
|
output: { whatever: true },
|
||||||
|
truncated: false,
|
||||||
|
});
|
||||||
|
expect(result.success).toBe(true);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('accepts a compacted frame', () => {
|
||||||
|
const result = WsFrameSchema.safeParse({
|
||||||
|
type: 'compacted',
|
||||||
|
session_id: VALID_UUID_A,
|
||||||
|
chat_id: VALID_UUID_B,
|
||||||
|
summary_message_id: VALID_UUID_C,
|
||||||
|
});
|
||||||
|
expect(result.success).toBe(true);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('accepts a session_workspace_updated frame', () => {
|
||||||
|
const result = WsFrameSchema.safeParse({
|
||||||
|
type: 'session_workspace_updated',
|
||||||
|
session_id: VALID_UUID_A,
|
||||||
|
workspace_panes: [{ id: 'p1', kind: 'chat', chatIds: [], activeChatIdx: 0 }],
|
||||||
|
});
|
||||||
|
expect(result.success).toBe(true);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('every KNOWN_FRAME_TYPES entry has a discriminated branch', () => {
|
||||||
|
// Probe each known type by attempting a minimal valid construction.
|
||||||
|
// Failure here means the union and the KNOWN_FRAME_TYPES list drifted.
|
||||||
|
for (const type of KNOWN_FRAME_TYPES) {
|
||||||
|
const probe = WsFrameSchema.safeParse({ type, __dummy__: true });
|
||||||
|
// We expect FAILURE on every type because we're missing required fields,
|
||||||
|
// but the failure must be ABOUT the missing fields, not about an unknown
|
||||||
|
// type. A "Invalid discriminator value" error means the type isn't in
|
||||||
|
// the union — that's a drift.
|
||||||
|
if (probe.success) continue;
|
||||||
|
const issues = probe.error.issues;
|
||||||
|
const hasInvalidDiscriminator = issues.some(
|
||||||
|
(i) => i.code === 'invalid_union_discriminator',
|
||||||
|
);
|
||||||
|
expect(hasInvalidDiscriminator, `frame type '${type}' is missing from the discriminated union`).toBe(false);
|
||||||
|
}
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
describe('ws-frames.ts file mirror parity', () => {
|
||||||
|
it('apps/server and apps/web copies are byte-identical', () => {
|
||||||
|
const here = fileURLToPath(import.meta.url);
|
||||||
|
const serverPath = resolve(here, '../../../types/ws-frames.ts');
|
||||||
|
const webPath = resolve(here, '../../../../../web/src/api/ws-frames.ts');
|
||||||
|
const serverContent = readFileSync(serverPath, 'utf8');
|
||||||
|
const webContent = readFileSync(webPath, 'utf8');
|
||||||
|
expect(webContent, 'apps/web/src/api/ws-frames.ts must be byte-identical to apps/server/src/types/ws-frames.ts').toBe(serverContent);
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
describe('broker.publishFrame / publishUserFrame fail-closed behavior', () => {
|
||||||
|
let logErrors: Array<{ obj: unknown; msg: string }>;
|
||||||
|
let mockLog: Parameters<typeof createBroker>[0];
|
||||||
|
|
||||||
|
beforeEach(() => {
|
||||||
|
logErrors = [];
|
||||||
|
mockLog = {
|
||||||
|
error: (obj: unknown, msg: string) => {
|
||||||
|
logErrors.push({ obj, msg });
|
||||||
|
},
|
||||||
|
info: () => {},
|
||||||
|
warn: () => {},
|
||||||
|
debug: () => {},
|
||||||
|
trace: () => {},
|
||||||
|
fatal: () => {},
|
||||||
|
child: () => mockLog as never,
|
||||||
|
level: 'info',
|
||||||
|
silent: () => {},
|
||||||
|
} as unknown as Parameters<typeof createBroker>[0];
|
||||||
|
});
|
||||||
|
|
||||||
|
afterEach(() => {
|
||||||
|
vi.restoreAllMocks();
|
||||||
|
});
|
||||||
|
|
||||||
|
it('publishFrame delivers a valid frame to subscribers', () => {
|
||||||
|
const broker = createBroker(mockLog);
|
||||||
|
const received: WsFrame[] = [];
|
||||||
|
broker.subscribe('sess-1', (f) => received.push(f as WsFrame));
|
||||||
|
broker.publishFrame('sess-1', {
|
||||||
|
type: 'delta',
|
||||||
|
message_id: VALID_UUID_A,
|
||||||
|
chat_id: VALID_UUID_B,
|
||||||
|
content: 'hello',
|
||||||
|
});
|
||||||
|
expect(received).toHaveLength(1);
|
||||||
|
expect((received[0] as { type: string }).type).toBe('delta');
|
||||||
|
expect(logErrors).toHaveLength(0);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('publishFrame drops + logs an invalid frame instead of delivering it', () => {
|
||||||
|
const broker = createBroker(mockLog);
|
||||||
|
const received: WsFrame[] = [];
|
||||||
|
broker.subscribe('sess-1', (f) => received.push(f as WsFrame));
|
||||||
|
broker.publishFrame('sess-1', {
|
||||||
|
type: 'delta',
|
||||||
|
message_id: 'not-a-uuid',
|
||||||
|
content: 'hello',
|
||||||
|
} as never);
|
||||||
|
expect(received).toHaveLength(0);
|
||||||
|
expect(logErrors).toHaveLength(1);
|
||||||
|
expect(logErrors[0]!.msg).toMatch(/ws-frame-validation-failed/);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('publishUserFrame drops + logs an invalid user-channel frame', () => {
|
||||||
|
const broker = createBroker(mockLog);
|
||||||
|
const received: WsFrame[] = [];
|
||||||
|
broker.subscribeUser('default', (f) => received.push(f as WsFrame));
|
||||||
|
broker.publishUserFrame('default', {
|
||||||
|
type: 'chat_status',
|
||||||
|
chat_id: VALID_UUID_A,
|
||||||
|
status: 'working', // v1.12.1 dropped this enum value
|
||||||
|
at: VALID_TIMESTAMP,
|
||||||
|
} as never);
|
||||||
|
expect(received).toHaveLength(0);
|
||||||
|
expect(logErrors).toHaveLength(1);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('publishFrame validation failure does not throw (no cascade into stream-phase)', () => {
|
||||||
|
const broker = createBroker(mockLog);
|
||||||
|
expect(() =>
|
||||||
|
broker.publishFrame('sess-1', { type: 'unknown_type' } as never),
|
||||||
|
).not.toThrow();
|
||||||
|
});
|
||||||
|
});
|
||||||
@@ -1,7 +1,7 @@
|
|||||||
import { promises as fs } from 'node:fs';
|
import { promises as fs } from 'node:fs';
|
||||||
import { join } from 'node:path';
|
import { join } from 'node:path';
|
||||||
import type { Agent, AgentsResponse, AgentParseError } from '../types/api.js';
|
import type { Agent, AgentsResponse, AgentParseError } from '../types/api.js';
|
||||||
import { ALL_TOOLS } from './tools.js';
|
import { ALL_TOOLS, resolveToolTier } from './tools.js';
|
||||||
|
|
||||||
// v1.8.1: global agents live at /data/AGENTS.md inside the container
|
// v1.8.1: global agents live at /data/AGENTS.md inside the container
|
||||||
// (./data:/data:ro mount on the host). Per-project AGENTS.md at the project
|
// (./data:/data:ro mount on the host). Per-project AGENTS.md at the project
|
||||||
@@ -186,11 +186,14 @@ function parseAgentSection(section: RawSection): Omit<Agent, 'source'> {
|
|||||||
throw new Error(fmErrors.join('; '));
|
throw new Error(fmErrors.join('; '));
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// v1.13.15-tools: intersect with BOOCODE_TOOLS tier (ceiling, not expansion).
|
||||||
|
// Unset → resolveToolTier returns ALL tool names → no narrowing.
|
||||||
|
const tierAllowed = new Set(resolveToolTier(process.env.BOOCODE_TOOLS));
|
||||||
const filteredTools = Array.isArray(fm.tools)
|
const filteredTools = Array.isArray(fm.tools)
|
||||||
? fm.tools.filter((t): t is string =>
|
? fm.tools.filter((t): t is string =>
|
||||||
(ALL_TOOL_NAMES as readonly string[]).includes(t),
|
(ALL_TOOL_NAMES as readonly string[]).includes(t) && tierAllowed.has(t),
|
||||||
)
|
)
|
||||||
: DEFAULT_TOOLS;
|
: DEFAULT_TOOLS.filter((t) => tierAllowed.has(t));
|
||||||
|
|
||||||
return {
|
return {
|
||||||
id: slugify(section.name),
|
id: slugify(section.name),
|
||||||
|
|||||||
@@ -1,3 +1,6 @@
|
|||||||
|
import type { FastifyBaseLogger } from 'fastify';
|
||||||
|
import { WsFrameSchema, type WsFrame } from '../types/ws-frames.js';
|
||||||
|
|
||||||
export type Frame = Record<string, unknown> & { type: string };
|
export type Frame = Record<string, unknown> & { type: string };
|
||||||
export type Listener = (frame: Frame) => void;
|
export type Listener = (frame: Frame) => void;
|
||||||
|
|
||||||
@@ -6,9 +9,15 @@ export interface Broker {
|
|||||||
subscribe(sessionId: string, listener: Listener): () => void;
|
subscribe(sessionId: string, listener: Listener): () => void;
|
||||||
publishUser(user: string, frame: Frame): void;
|
publishUser(user: string, frame: Frame): void;
|
||||||
subscribeUser(user: string, listener: Listener): () => void;
|
subscribeUser(user: string, listener: Listener): () => void;
|
||||||
|
// v1.13.11-a: typed publish wrappers. Validate against WsFrameSchema and
|
||||||
|
// delegate to publish / publishUser on success; log + drop on failure
|
||||||
|
// (fail-closed). Existing publish / publishUser callers stay legal — they
|
||||||
|
// get converted to the typed variant in v1.13.11-b.
|
||||||
|
publishFrame(sessionId: string, frame: WsFrame): void;
|
||||||
|
publishUserFrame(user: string, frame: WsFrame): void;
|
||||||
}
|
}
|
||||||
|
|
||||||
export function createBroker(): Broker {
|
export function createBroker(log?: FastifyBaseLogger): Broker {
|
||||||
const topics = new Map<string, Set<Listener>>();
|
const topics = new Map<string, Set<Listener>>();
|
||||||
const userTopics = new Map<string, Set<Listener>>();
|
const userTopics = new Map<string, Set<Listener>>();
|
||||||
|
|
||||||
@@ -39,6 +48,28 @@ export function createBroker(): Broker {
|
|||||||
};
|
};
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// v1.13.11-a: shared validation guard. Returns the parsed/typed frame on
|
||||||
|
// success, or null on failure (after logging). Brief mandates fail-closed
|
||||||
|
// semantics: invalid frames don't reach subscribers; throwing here could
|
||||||
|
// cascade into stream-phase aborts which v1.13.7 already had to defend
|
||||||
|
// against, so log + drop is the right shape.
|
||||||
|
function validate(channel: 'session' | 'user', key: string, frame: WsFrame): WsFrame | null {
|
||||||
|
const parsed = WsFrameSchema.safeParse(frame);
|
||||||
|
if (parsed.success) return parsed.data;
|
||||||
|
const frameType = (frame as { type?: unknown })?.type;
|
||||||
|
const errors = parsed.error.flatten();
|
||||||
|
if (log) {
|
||||||
|
log.error(
|
||||||
|
{ channel, key, frame_type: frameType, errors },
|
||||||
|
'ws-frame-validation-failed: dropping invalid frame',
|
||||||
|
);
|
||||||
|
} else {
|
||||||
|
// Fallback for callers that didn't pass a logger (e.g. unit tests).
|
||||||
|
console.error('ws-frame-validation-failed', { channel, key, frame_type: frameType, errors });
|
||||||
|
}
|
||||||
|
return null;
|
||||||
|
}
|
||||||
|
|
||||||
return {
|
return {
|
||||||
publish(sessionId, frame) {
|
publish(sessionId, frame) {
|
||||||
publishTo(topics, sessionId, frame);
|
publishTo(topics, sessionId, frame);
|
||||||
@@ -52,5 +83,15 @@ export function createBroker(): Broker {
|
|||||||
subscribeUser(user, listener) {
|
subscribeUser(user, listener) {
|
||||||
return subscribeTo(userTopics, user, listener);
|
return subscribeTo(userTopics, user, listener);
|
||||||
},
|
},
|
||||||
|
publishFrame(sessionId, frame) {
|
||||||
|
const valid = validate('session', sessionId, frame);
|
||||||
|
if (!valid) return;
|
||||||
|
publishTo(topics, sessionId, valid as Frame);
|
||||||
|
},
|
||||||
|
publishUserFrame(user, frame) {
|
||||||
|
const valid = validate('user', user, frame);
|
||||||
|
if (!valid) return;
|
||||||
|
publishTo(userTopics, user, valid as Frame);
|
||||||
|
},
|
||||||
};
|
};
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -23,7 +23,13 @@ import type { Broker } from './broker.js';
|
|||||||
import { SUMMARY_TEMPLATE } from './compaction-prompt.js';
|
import { SUMMARY_TEMPLATE } from './compaction-prompt.js';
|
||||||
import * as modelContextLookup from './model-context.js';
|
import * as modelContextLookup from './model-context.js';
|
||||||
|
|
||||||
const COMPACTION_BUFFER = 20_000;
|
// v1.13.9: ratio-only overflow trigger. Fires compaction at 85% of ctx_max
|
||||||
|
// (opencode session/overflow.ts pattern). Replaces the v1.11.0-era
|
||||||
|
// `ctx_max - 20_000` formula which degenerated to 0 for contexts ≤20k and
|
||||||
|
// gave only 7-8% headroom to the summarizer at 262k. Ratio gives consistent
|
||||||
|
// 15% headroom at any scale, and small-ctx models no longer get an
|
||||||
|
// effectively-disabled trigger.
|
||||||
|
const EARLY_TRIGGER_RATIO = 0.85;
|
||||||
const MIN_PRESERVE_RECENT_TOKENS = 2_000;
|
const MIN_PRESERVE_RECENT_TOKENS = 2_000;
|
||||||
const MAX_PRESERVE_RECENT_TOKENS = 8_000;
|
const MAX_PRESERVE_RECENT_TOKENS = 8_000;
|
||||||
const DEFAULT_TAIL_TURNS = 2;
|
const DEFAULT_TAIL_TURNS = 2;
|
||||||
@@ -50,13 +56,13 @@ export interface CompactionMessage {
|
|||||||
|
|
||||||
// === overflow ===
|
// === overflow ===
|
||||||
|
|
||||||
// Tokens we hold in reserve for the model's response so a near-full context
|
// Returns the token budget at which overflow fires. Triggers compaction at
|
||||||
// can still produce a useful turn. Mirrors opencode's COMPACTION_BUFFER.
|
// 85% of contextLimit (opencode session/overflow.ts pattern). Returns 0 when
|
||||||
// Returns 0 when the context limit is unknown (caller treats 0 as "do not
|
// the context limit is unknown — caller treats 0 as "do not trigger overflow",
|
||||||
// trigger overflow"); avoids dividing-by-zero downstream.
|
// keeping inference flowing rather than compacting a turn we can't size.
|
||||||
export function usable(contextLimit: number): number {
|
export function usable(contextLimit: number): number {
|
||||||
if (!contextLimit || contextLimit <= 0) return 0;
|
if (!contextLimit || contextLimit <= 0) return 0;
|
||||||
return Math.max(0, contextLimit - COMPACTION_BUFFER);
|
return Math.floor(EARLY_TRIGGER_RATIO * contextLimit);
|
||||||
}
|
}
|
||||||
|
|
||||||
export interface Usage {
|
export interface Usage {
|
||||||
@@ -425,15 +431,16 @@ export async function process(input: ProcessInput): Promise<void> {
|
|||||||
'compaction: invoking model',
|
'compaction: invoking model',
|
||||||
);
|
);
|
||||||
|
|
||||||
// 6a. Flip the chat dot amber for the duration of the LLM call + DB writes.
|
// 6a. Flip the chat dot for the duration of the LLM call + DB writes.
|
||||||
// Same { type: 'chat_status', status: 'working', at } shape inference.ts
|
// v1.13.11-b: publish status='streaming' (the v1.12.1-widened replacement
|
||||||
// emits at runner enqueue. publishUser → broadcasts on the per-user channel
|
// for the dropped 'working' value). Compaction's LLM call has the same
|
||||||
// (all devices / tabs see it) since chat_status is a user-channel frame in
|
// semantic as an inference turn for dot-state purposes. The v1.12.1
|
||||||
// BooCode (see useChatStatus.ts, which is the consumer).
|
// chat_status widening missed this site; v1.13.11's WsFrame Zod schema
|
||||||
broker.publishUser('default', {
|
// surfaced the drift via the unknown-enum-value check.
|
||||||
|
broker.publishUserFrame('default', {
|
||||||
type: 'chat_status',
|
type: 'chat_status',
|
||||||
chat_id: chatId,
|
chat_id: chatId,
|
||||||
status: 'working',
|
status: 'streaming',
|
||||||
at: new Date().toISOString(),
|
at: new Date().toISOString(),
|
||||||
});
|
});
|
||||||
|
|
||||||
@@ -502,7 +509,7 @@ export async function process(input: ProcessInput): Promise<void> {
|
|||||||
// Always restore the dot. Status='idle' (not 'error') even on failure —
|
// Always restore the dot. Status='idle' (not 'error') even on failure —
|
||||||
// the caller logs/re-surfaces the error separately; the dot doesn't
|
// the caller logs/re-surfaces the error separately; the dot doesn't
|
||||||
// need to stay red across reloads for a transient compaction blip.
|
// need to stay red across reloads for a transient compaction blip.
|
||||||
broker.publishUser('default', {
|
broker.publishUserFrame('default', {
|
||||||
type: 'chat_status',
|
type: 'chat_status',
|
||||||
chat_id: chatId,
|
chat_id: chatId,
|
||||||
status: 'idle',
|
status: 'idle',
|
||||||
@@ -516,7 +523,7 @@ export async function process(input: ProcessInput): Promise<void> {
|
|||||||
// toast. Order matters: idle must precede 'compacted' so the dot is
|
// toast. Order matters: idle must precede 'compacted' so the dot is
|
||||||
// already green by the time the refetch toast appears.
|
// already green by the time the refetch toast appears.
|
||||||
if (succeeded) {
|
if (succeeded) {
|
||||||
broker.publish(sessionId, {
|
broker.publishFrame(sessionId, {
|
||||||
type: 'compacted',
|
type: 'compacted',
|
||||||
session_id: sessionId,
|
session_id: sessionId,
|
||||||
chat_id: chatId,
|
chat_id: chatId,
|
||||||
|
|||||||
@@ -199,10 +199,13 @@ export async function maybeFlagForCompaction(
|
|||||||
);
|
);
|
||||||
if (!overflow) return;
|
if (!overflow) return;
|
||||||
|
|
||||||
// v1.13.4: try the cheap prune first. If it freed at least the buffer
|
// v1.13.4: try the cheap prune first. If it freed at least
|
||||||
// worth of tokens (PRUNE_TRIGGER_TOKENS, identical to COMPACTION_BUFFER),
|
// PRUNE_TRIGGER_TOKENS (20k) worth of context, we're below the threshold
|
||||||
// we're below the threshold again — skip flagging summarize for the next
|
// again — skip flagging summarize for the next turn. The next turn's
|
||||||
// turn. The next turn's overflow check will re-evaluate from scratch.
|
// overflow check will re-evaluate from scratch.
|
||||||
|
// v1.13.9: the overflow trigger above is now 85% of ctx_max (was
|
||||||
|
// ctx_max - 20k). PRUNE_TRIGGER_TOKENS stays at 20k as the prune-freed
|
||||||
|
// threshold — independent of the overflow formula.
|
||||||
// Prune failures (DB errors etc.) propagate so the surrounding inference
|
// Prune failures (DB errors etc.) propagate so the surrounding inference
|
||||||
// path sees them; the catch in finalizeCompletion / executeToolPhase
|
// path sees them; the catch in finalizeCompletion / executeToolPhase
|
||||||
// doesn't shield this — by design, we want to know if prune is broken.
|
// doesn't shield this — by design, we want to know if prune is broken.
|
||||||
|
|||||||
@@ -700,6 +700,64 @@ export const TOOLS_BY_NAME: Record<string, ToolDef<unknown>> = Object.fromEntrie
|
|||||||
ALL_TOOLS.map((t) => [t.name, t])
|
ALL_TOOLS.map((t) => [t.name, t])
|
||||||
);
|
);
|
||||||
|
|
||||||
|
// v1.13.15-tools: tiered tool loading. BOOCODE_TOOLS env var (`core` |
|
||||||
|
// `standard` | `all`) filters the agent's tool whitelist before LLM dispatch.
|
||||||
|
// Daily-driver token win on qwen3.6-35b-a3b — the 35B-A3B MoE benefits from
|
||||||
|
// any prompt-cache stability win (fewer tools = shorter, more stable tool
|
||||||
|
// schemas in the system prompt). Pattern lift from eyaltoledano/claude-task-
|
||||||
|
// master (MIT + Commons Clause — pattern only, no code lift).
|
||||||
|
//
|
||||||
|
// The env var is a CEILING. It only narrows; never expands an agent's
|
||||||
|
// declared whitelist. Default behavior (var unset) is unchanged: all tools.
|
||||||
|
export const CORE_TOOL_NAMES = [
|
||||||
|
'view_file',
|
||||||
|
'list_dir',
|
||||||
|
'grep',
|
||||||
|
'find_files',
|
||||||
|
] as const;
|
||||||
|
|
||||||
|
export const STANDARD_TOOL_NAMES = [
|
||||||
|
...CORE_TOOL_NAMES,
|
||||||
|
'web_search',
|
||||||
|
'web_fetch',
|
||||||
|
'git_status',
|
||||||
|
'get_codebase_overview',
|
||||||
|
'get_file_analysis',
|
||||||
|
'get_symbol_info',
|
||||||
|
'search_symbols',
|
||||||
|
'get_dependencies',
|
||||||
|
'watch_changes',
|
||||||
|
'get_semantic_neighborhoods',
|
||||||
|
'get_framework_analysis',
|
||||||
|
] as const;
|
||||||
|
|
||||||
|
// Module-load validation: every name in CORE / STANDARD must exist in
|
||||||
|
// TOOLS_BY_NAME. Catches typos and stale tier definitions before they reach
|
||||||
|
// production; server boot fails loudly rather than silently filtering valid
|
||||||
|
// tools out of agent whitelists.
|
||||||
|
for (const name of CORE_TOOL_NAMES) {
|
||||||
|
if (!TOOLS_BY_NAME[name]) {
|
||||||
|
throw new Error(`CORE_TOOL_NAMES references unknown tool: '${name}'`);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
for (const name of STANDARD_TOOL_NAMES) {
|
||||||
|
if (!TOOLS_BY_NAME[name]) {
|
||||||
|
throw new Error(`STANDARD_TOOL_NAMES references unknown tool: '${name}'`);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
export function resolveToolTier(tier: string | undefined): readonly string[] {
|
||||||
|
switch ((tier ?? 'all').toLowerCase()) {
|
||||||
|
case 'core':
|
||||||
|
return CORE_TOOL_NAMES;
|
||||||
|
case 'standard':
|
||||||
|
return STANDARD_TOOL_NAMES;
|
||||||
|
case 'all':
|
||||||
|
default:
|
||||||
|
return ALL_TOOLS.map((t) => t.name);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
export function toolJsonSchemas(): ToolJsonSchema[] {
|
export function toolJsonSchemas(): ToolJsonSchema[] {
|
||||||
return ALL_TOOLS.map((t) => t.jsonSchema);
|
return ALL_TOOLS.map((t) => t.jsonSchema);
|
||||||
}
|
}
|
||||||
|
|||||||
318
apps/server/src/types/ws-frames.ts
Normal file
318
apps/server/src/types/ws-frames.ts
Normal file
@@ -0,0 +1,318 @@
|
|||||||
|
// v1.13.11-a: Zod schemas for every WebSocket frame published by the server.
|
||||||
|
// Validation runs both on send (broker.publishFrame / publishUserFrame) and
|
||||||
|
// on receive (apps/web/src/hooks/useSessionStream + useUserEvents). Catches
|
||||||
|
// silent protocol drift between publisher and consumer.
|
||||||
|
//
|
||||||
|
// IMPORTANT: This file is duplicated byte-identical at
|
||||||
|
// apps/web/src/api/ws-frames.ts. The two apps have separate tsconfigs and
|
||||||
|
// no path alias; the duplication is sync-by-hand. A test asserts the two
|
||||||
|
// files match. If you change one, change the other.
|
||||||
|
//
|
||||||
|
// Per-kind payload schemas (tool_call args, message_parts payloads, etc.)
|
||||||
|
// stay z.unknown() in v1.13.11. Frame-level drift detection is the goal;
|
||||||
|
// deep payload validation is follow-up work.
|
||||||
|
|
||||||
|
import { z } from 'zod';
|
||||||
|
|
||||||
|
// ---- shared primitives -----------------------------------------------------
|
||||||
|
|
||||||
|
const Uuid = z.string().uuid();
|
||||||
|
// Tool call IDs are model-emitted (e.g. "call_abc123") — not UUIDs.
|
||||||
|
const ToolCallId = z.string().min(1);
|
||||||
|
const IsoTimestamp = z.string().min(1);
|
||||||
|
|
||||||
|
const ChatStatusValue = z.enum([
|
||||||
|
'streaming',
|
||||||
|
'tool_running',
|
||||||
|
'waiting_for_input',
|
||||||
|
'idle',
|
||||||
|
'error',
|
||||||
|
]);
|
||||||
|
|
||||||
|
const ErrorReasonValue = z.enum([
|
||||||
|
'llm_provider_error',
|
||||||
|
'doom_loop',
|
||||||
|
'doom_loop_summary_failed',
|
||||||
|
'cap_hit',
|
||||||
|
'cap_hit_summary_failed',
|
||||||
|
]);
|
||||||
|
|
||||||
|
const MessageRoleValue = z.enum(['user', 'assistant', 'system', 'tool']);
|
||||||
|
|
||||||
|
const ToolCallShape = z.object({
|
||||||
|
id: ToolCallId,
|
||||||
|
name: z.string().min(1),
|
||||||
|
args: z.record(z.string(), z.unknown()),
|
||||||
|
});
|
||||||
|
|
||||||
|
// Free-form bags: opaque to the frame schema; deep validation is out of
|
||||||
|
// scope for v1.13.11 (frame-level drift detection is the goal; per-kind
|
||||||
|
// payload narrowing is follow-up work). z.unknown() means the consumer
|
||||||
|
// must narrow before reading — TypeScript-side this is fine because every
|
||||||
|
// consumer already operates on the hand-maintained Project / Chat / Session
|
||||||
|
// / WorkspacePane types (the brief's "Don't strip existing types yet"
|
||||||
|
// rule), and the Zod-typed shape is only used at the publishFrame boundary.
|
||||||
|
const OpaqueObject = z.unknown();
|
||||||
|
|
||||||
|
// ---- per-session channel frames --------------------------------------------
|
||||||
|
|
||||||
|
export const SnapshotFrame = z.object({
|
||||||
|
type: z.literal('snapshot'),
|
||||||
|
messages: z.array(OpaqueObject),
|
||||||
|
});
|
||||||
|
|
||||||
|
export const MessageStartedFrame = z.object({
|
||||||
|
type: z.literal('message_started'),
|
||||||
|
message_id: Uuid,
|
||||||
|
chat_id: Uuid.optional(),
|
||||||
|
role: MessageRoleValue,
|
||||||
|
});
|
||||||
|
|
||||||
|
export const DeltaFrame = z.object({
|
||||||
|
type: z.literal('delta'),
|
||||||
|
message_id: Uuid,
|
||||||
|
chat_id: Uuid.optional(),
|
||||||
|
content: z.string(),
|
||||||
|
});
|
||||||
|
|
||||||
|
export const ToolCallFrame = z.object({
|
||||||
|
type: z.literal('tool_call'),
|
||||||
|
message_id: Uuid,
|
||||||
|
chat_id: Uuid.optional(),
|
||||||
|
tool_call: ToolCallShape,
|
||||||
|
});
|
||||||
|
|
||||||
|
export const ToolResultFrame = z.object({
|
||||||
|
type: z.literal('tool_result'),
|
||||||
|
tool_message_id: Uuid,
|
||||||
|
chat_id: Uuid.optional(),
|
||||||
|
tool_call_id: ToolCallId,
|
||||||
|
output: z.unknown(),
|
||||||
|
truncated: z.boolean(),
|
||||||
|
error: z.string().optional(),
|
||||||
|
});
|
||||||
|
|
||||||
|
export const MessageCompleteFrame = z.object({
|
||||||
|
type: z.literal('message_complete'),
|
||||||
|
message_id: Uuid,
|
||||||
|
chat_id: Uuid.optional(),
|
||||||
|
tokens_used: z.number().int().nonnegative().nullable().optional(),
|
||||||
|
ctx_used: z.number().int().nonnegative().nullable().optional(),
|
||||||
|
ctx_max: z.number().int().positive().nullable().optional(),
|
||||||
|
started_at: IsoTimestamp.nullable().optional(),
|
||||||
|
finished_at: IsoTimestamp.nullable().optional(),
|
||||||
|
model: z.string().optional(),
|
||||||
|
metadata: OpaqueObject.nullable().optional(),
|
||||||
|
});
|
||||||
|
|
||||||
|
export const UsageFrame = z.object({
|
||||||
|
type: z.literal('usage'),
|
||||||
|
message_id: Uuid,
|
||||||
|
chat_id: Uuid.optional(),
|
||||||
|
completion_tokens: z.number().int().nonnegative().nullable(),
|
||||||
|
ctx_used: z.number().int().nonnegative().nullable(),
|
||||||
|
ctx_max: z.number().int().positive().nullable(),
|
||||||
|
});
|
||||||
|
|
||||||
|
export const MessagesDeletedFrame = z.object({
|
||||||
|
type: z.literal('messages_deleted'),
|
||||||
|
message_ids: z.array(Uuid),
|
||||||
|
chat_id: Uuid.optional(),
|
||||||
|
});
|
||||||
|
|
||||||
|
export const ChatRenamedFrame = z.object({
|
||||||
|
type: z.literal('chat_renamed'),
|
||||||
|
chat_id: Uuid,
|
||||||
|
name: z.string(),
|
||||||
|
});
|
||||||
|
|
||||||
|
export const CompactedFrame = z.object({
|
||||||
|
type: z.literal('compacted'),
|
||||||
|
session_id: Uuid,
|
||||||
|
chat_id: Uuid,
|
||||||
|
summary_message_id: Uuid,
|
||||||
|
});
|
||||||
|
|
||||||
|
export const ErrorFrame = z.object({
|
||||||
|
type: z.literal('error'),
|
||||||
|
message_id: Uuid.optional(),
|
||||||
|
chat_id: Uuid.optional(),
|
||||||
|
error: z.string(),
|
||||||
|
reason: ErrorReasonValue.optional(),
|
||||||
|
});
|
||||||
|
|
||||||
|
// ---- per-user channel frames (sidebar refresh) -----------------------------
|
||||||
|
|
||||||
|
export const ChatStatusFrame = z.object({
|
||||||
|
type: z.literal('chat_status'),
|
||||||
|
chat_id: Uuid,
|
||||||
|
status: ChatStatusValue,
|
||||||
|
at: IsoTimestamp,
|
||||||
|
reason: ErrorReasonValue.optional(),
|
||||||
|
});
|
||||||
|
|
||||||
|
export const SessionUpdatedFrame = z.object({
|
||||||
|
type: z.literal('session_updated'),
|
||||||
|
session_id: Uuid,
|
||||||
|
project_id: Uuid,
|
||||||
|
name: z.string(),
|
||||||
|
updated_at: IsoTimestamp,
|
||||||
|
});
|
||||||
|
|
||||||
|
export const SessionRenamedFrame = z.object({
|
||||||
|
type: z.literal('session_renamed'),
|
||||||
|
session_id: Uuid,
|
||||||
|
name: z.string(),
|
||||||
|
});
|
||||||
|
|
||||||
|
export const SessionCreatedFrame = z.object({
|
||||||
|
type: z.literal('session_created'),
|
||||||
|
session: OpaqueObject,
|
||||||
|
project_id: Uuid,
|
||||||
|
});
|
||||||
|
|
||||||
|
export const SessionArchivedFrame = z.object({
|
||||||
|
type: z.literal('session_archived'),
|
||||||
|
session_id: Uuid,
|
||||||
|
project_id: Uuid,
|
||||||
|
});
|
||||||
|
|
||||||
|
export const SessionDeletedFrame = z.object({
|
||||||
|
type: z.literal('session_deleted'),
|
||||||
|
session_id: Uuid,
|
||||||
|
project_id: Uuid,
|
||||||
|
});
|
||||||
|
|
||||||
|
export const SessionWorkspaceUpdatedFrame = z.object({
|
||||||
|
type: z.literal('session_workspace_updated'),
|
||||||
|
session_id: Uuid,
|
||||||
|
workspace_panes: z.array(OpaqueObject),
|
||||||
|
});
|
||||||
|
|
||||||
|
export const ChatCreatedFrame = z.object({
|
||||||
|
type: z.literal('chat_created'),
|
||||||
|
chat: OpaqueObject,
|
||||||
|
session_id: Uuid,
|
||||||
|
});
|
||||||
|
|
||||||
|
export const ChatUpdatedFrame = z.object({
|
||||||
|
type: z.literal('chat_updated'),
|
||||||
|
chat_id: Uuid,
|
||||||
|
session_id: Uuid,
|
||||||
|
name: z.string().nullable(),
|
||||||
|
updated_at: IsoTimestamp,
|
||||||
|
});
|
||||||
|
|
||||||
|
export const ChatArchivedFrame = z.object({
|
||||||
|
type: z.literal('chat_archived'),
|
||||||
|
chat_id: Uuid,
|
||||||
|
session_id: Uuid,
|
||||||
|
});
|
||||||
|
|
||||||
|
export const ChatUnarchivedFrame = z.object({
|
||||||
|
type: z.literal('chat_unarchived'),
|
||||||
|
chat: OpaqueObject,
|
||||||
|
});
|
||||||
|
|
||||||
|
export const ChatDeletedFrame = z.object({
|
||||||
|
type: z.literal('chat_deleted'),
|
||||||
|
chat_id: Uuid,
|
||||||
|
session_id: Uuid,
|
||||||
|
});
|
||||||
|
|
||||||
|
export const ProjectCreatedFrame = z.object({
|
||||||
|
type: z.literal('project_created'),
|
||||||
|
project: OpaqueObject,
|
||||||
|
});
|
||||||
|
|
||||||
|
export const ProjectArchivedFrame = z.object({
|
||||||
|
type: z.literal('project_archived'),
|
||||||
|
project_id: Uuid,
|
||||||
|
});
|
||||||
|
|
||||||
|
export const ProjectUnarchivedFrame = z.object({
|
||||||
|
type: z.literal('project_unarchived'),
|
||||||
|
project: OpaqueObject,
|
||||||
|
});
|
||||||
|
|
||||||
|
export const ProjectUpdatedFrame = z.object({
|
||||||
|
type: z.literal('project_updated'),
|
||||||
|
project_id: Uuid,
|
||||||
|
name: z.string(),
|
||||||
|
});
|
||||||
|
|
||||||
|
export const ProjectDeletedFrame = z.object({
|
||||||
|
type: z.literal('project_deleted'),
|
||||||
|
project_id: Uuid,
|
||||||
|
});
|
||||||
|
|
||||||
|
// ---- discriminated union ---------------------------------------------------
|
||||||
|
|
||||||
|
export const WsFrameSchema = z.discriminatedUnion('type', [
|
||||||
|
// per-session
|
||||||
|
SnapshotFrame,
|
||||||
|
MessageStartedFrame,
|
||||||
|
DeltaFrame,
|
||||||
|
ToolCallFrame,
|
||||||
|
ToolResultFrame,
|
||||||
|
MessageCompleteFrame,
|
||||||
|
UsageFrame,
|
||||||
|
MessagesDeletedFrame,
|
||||||
|
ChatRenamedFrame,
|
||||||
|
CompactedFrame,
|
||||||
|
ErrorFrame,
|
||||||
|
// per-user
|
||||||
|
ChatStatusFrame,
|
||||||
|
SessionUpdatedFrame,
|
||||||
|
SessionRenamedFrame,
|
||||||
|
SessionCreatedFrame,
|
||||||
|
SessionArchivedFrame,
|
||||||
|
SessionDeletedFrame,
|
||||||
|
SessionWorkspaceUpdatedFrame,
|
||||||
|
ChatCreatedFrame,
|
||||||
|
ChatUpdatedFrame,
|
||||||
|
ChatArchivedFrame,
|
||||||
|
ChatUnarchivedFrame,
|
||||||
|
ChatDeletedFrame,
|
||||||
|
ProjectCreatedFrame,
|
||||||
|
ProjectArchivedFrame,
|
||||||
|
ProjectUnarchivedFrame,
|
||||||
|
ProjectUpdatedFrame,
|
||||||
|
ProjectDeletedFrame,
|
||||||
|
]);
|
||||||
|
|
||||||
|
export type WsFrame = z.infer<typeof WsFrameSchema>;
|
||||||
|
|
||||||
|
// Convenience: the set of known frame types. Useful for the publishFrame
|
||||||
|
// helper to log the offending type name when validation fails. Kept in sync
|
||||||
|
// by hand with the discriminated union above.
|
||||||
|
export const KNOWN_FRAME_TYPES: readonly WsFrame['type'][] = [
|
||||||
|
'snapshot',
|
||||||
|
'message_started',
|
||||||
|
'delta',
|
||||||
|
'tool_call',
|
||||||
|
'tool_result',
|
||||||
|
'message_complete',
|
||||||
|
'usage',
|
||||||
|
'messages_deleted',
|
||||||
|
'chat_renamed',
|
||||||
|
'compacted',
|
||||||
|
'error',
|
||||||
|
'chat_status',
|
||||||
|
'session_updated',
|
||||||
|
'session_renamed',
|
||||||
|
'session_created',
|
||||||
|
'session_archived',
|
||||||
|
'session_deleted',
|
||||||
|
'session_workspace_updated',
|
||||||
|
'chat_created',
|
||||||
|
'chat_updated',
|
||||||
|
'chat_archived',
|
||||||
|
'chat_unarchived',
|
||||||
|
'chat_deleted',
|
||||||
|
'project_created',
|
||||||
|
'project_archived',
|
||||||
|
'project_unarchived',
|
||||||
|
'project_updated',
|
||||||
|
'project_deleted',
|
||||||
|
] as const;
|
||||||
@@ -31,7 +31,8 @@
|
|||||||
"shiki": "^1.29.2",
|
"shiki": "^1.29.2",
|
||||||
"sonner": "^2.0.7",
|
"sonner": "^2.0.7",
|
||||||
"tailwind-merge": "^3.6.0",
|
"tailwind-merge": "^3.6.0",
|
||||||
"tw-animate-css": "^1.4.0"
|
"tw-animate-css": "^1.4.0",
|
||||||
|
"zod": "^3.23.8"
|
||||||
},
|
},
|
||||||
"devDependencies": {
|
"devDependencies": {
|
||||||
"@tailwindcss/postcss": "^4.3.0",
|
"@tailwindcss/postcss": "^4.3.0",
|
||||||
|
|||||||
@@ -12,6 +12,7 @@ import type {
|
|||||||
GitMeta,
|
GitMeta,
|
||||||
Skill,
|
Skill,
|
||||||
AskUserAnswer,
|
AskUserAnswer,
|
||||||
|
ToolCostStat,
|
||||||
} from './types';
|
} from './types';
|
||||||
|
|
||||||
export class ApiError extends Error {
|
export class ApiError extends Error {
|
||||||
@@ -262,6 +263,14 @@ export const api = {
|
|||||||
list: () => request<{ skills: Skill[] }>('/api/skills'),
|
list: () => request<{ skills: Skill[] }>('/api/skills'),
|
||||||
},
|
},
|
||||||
|
|
||||||
|
// v1.13.10: per-tool cost rolling-window stats (last 100 calls per tool,
|
||||||
|
// equal-split attribution across multi-tool turns). Read endpoint backed by
|
||||||
|
// the tool_cost_stats view. AgentPicker consumes this for per-agent cost
|
||||||
|
// hints.
|
||||||
|
tools: {
|
||||||
|
costStats: () => request<{ stats: ToolCostStat[] }>('/api/tools/cost_stats'),
|
||||||
|
},
|
||||||
|
|
||||||
settings: {
|
settings: {
|
||||||
get: () => request<Record<string, unknown>>('/api/settings'),
|
get: () => request<Record<string, unknown>>('/api/settings'),
|
||||||
patch: (body: Record<string, unknown>) =>
|
patch: (body: Record<string, unknown>) =>
|
||||||
|
|||||||
@@ -1,6 +1,18 @@
|
|||||||
export const PROJECT_STATUSES = ['open', 'archived'] as const;
|
export const PROJECT_STATUSES = ['open', 'archived'] as const;
|
||||||
export type ProjectStatus = typeof PROJECT_STATUSES[number];
|
export type ProjectStatus = typeof PROJECT_STATUSES[number];
|
||||||
|
|
||||||
|
// v1.13.10: per-tool cost rolling-window stat. Returned by
|
||||||
|
// GET /api/tools/cost_stats — one entry per tool with mean prompt/completion
|
||||||
|
// tokens over the last 100 invocations. AgentPicker sums across an agent's
|
||||||
|
// whitelisted tools for per-agent cost hints.
|
||||||
|
export interface ToolCostStat {
|
||||||
|
tool_name: string;
|
||||||
|
mean_prompt_tokens: number;
|
||||||
|
mean_completion_tokens: number;
|
||||||
|
n_calls: number;
|
||||||
|
updated_at: string;
|
||||||
|
}
|
||||||
|
|
||||||
export interface Project {
|
export interface Project {
|
||||||
id: string;
|
id: string;
|
||||||
name: string;
|
name: string;
|
||||||
|
|||||||
318
apps/web/src/api/ws-frames.ts
Normal file
318
apps/web/src/api/ws-frames.ts
Normal file
@@ -0,0 +1,318 @@
|
|||||||
|
// v1.13.11-a: Zod schemas for every WebSocket frame published by the server.
|
||||||
|
// Validation runs both on send (broker.publishFrame / publishUserFrame) and
|
||||||
|
// on receive (apps/web/src/hooks/useSessionStream + useUserEvents). Catches
|
||||||
|
// silent protocol drift between publisher and consumer.
|
||||||
|
//
|
||||||
|
// IMPORTANT: This file is duplicated byte-identical at
|
||||||
|
// apps/web/src/api/ws-frames.ts. The two apps have separate tsconfigs and
|
||||||
|
// no path alias; the duplication is sync-by-hand. A test asserts the two
|
||||||
|
// files match. If you change one, change the other.
|
||||||
|
//
|
||||||
|
// Per-kind payload schemas (tool_call args, message_parts payloads, etc.)
|
||||||
|
// stay z.unknown() in v1.13.11. Frame-level drift detection is the goal;
|
||||||
|
// deep payload validation is follow-up work.
|
||||||
|
|
||||||
|
import { z } from 'zod';
|
||||||
|
|
||||||
|
// ---- shared primitives -----------------------------------------------------
|
||||||
|
|
||||||
|
const Uuid = z.string().uuid();
|
||||||
|
// Tool call IDs are model-emitted (e.g. "call_abc123") — not UUIDs.
|
||||||
|
const ToolCallId = z.string().min(1);
|
||||||
|
const IsoTimestamp = z.string().min(1);
|
||||||
|
|
||||||
|
const ChatStatusValue = z.enum([
|
||||||
|
'streaming',
|
||||||
|
'tool_running',
|
||||||
|
'waiting_for_input',
|
||||||
|
'idle',
|
||||||
|
'error',
|
||||||
|
]);
|
||||||
|
|
||||||
|
const ErrorReasonValue = z.enum([
|
||||||
|
'llm_provider_error',
|
||||||
|
'doom_loop',
|
||||||
|
'doom_loop_summary_failed',
|
||||||
|
'cap_hit',
|
||||||
|
'cap_hit_summary_failed',
|
||||||
|
]);
|
||||||
|
|
||||||
|
const MessageRoleValue = z.enum(['user', 'assistant', 'system', 'tool']);
|
||||||
|
|
||||||
|
const ToolCallShape = z.object({
|
||||||
|
id: ToolCallId,
|
||||||
|
name: z.string().min(1),
|
||||||
|
args: z.record(z.string(), z.unknown()),
|
||||||
|
});
|
||||||
|
|
||||||
|
// Free-form bags: opaque to the frame schema; deep validation is out of
|
||||||
|
// scope for v1.13.11 (frame-level drift detection is the goal; per-kind
|
||||||
|
// payload narrowing is follow-up work). z.unknown() means the consumer
|
||||||
|
// must narrow before reading — TypeScript-side this is fine because every
|
||||||
|
// consumer already operates on the hand-maintained Project / Chat / Session
|
||||||
|
// / WorkspacePane types (the brief's "Don't strip existing types yet"
|
||||||
|
// rule), and the Zod-typed shape is only used at the publishFrame boundary.
|
||||||
|
const OpaqueObject = z.unknown();
|
||||||
|
|
||||||
|
// ---- per-session channel frames --------------------------------------------
|
||||||
|
|
||||||
|
export const SnapshotFrame = z.object({
|
||||||
|
type: z.literal('snapshot'),
|
||||||
|
messages: z.array(OpaqueObject),
|
||||||
|
});
|
||||||
|
|
||||||
|
export const MessageStartedFrame = z.object({
|
||||||
|
type: z.literal('message_started'),
|
||||||
|
message_id: Uuid,
|
||||||
|
chat_id: Uuid.optional(),
|
||||||
|
role: MessageRoleValue,
|
||||||
|
});
|
||||||
|
|
||||||
|
export const DeltaFrame = z.object({
|
||||||
|
type: z.literal('delta'),
|
||||||
|
message_id: Uuid,
|
||||||
|
chat_id: Uuid.optional(),
|
||||||
|
content: z.string(),
|
||||||
|
});
|
||||||
|
|
||||||
|
export const ToolCallFrame = z.object({
|
||||||
|
type: z.literal('tool_call'),
|
||||||
|
message_id: Uuid,
|
||||||
|
chat_id: Uuid.optional(),
|
||||||
|
tool_call: ToolCallShape,
|
||||||
|
});
|
||||||
|
|
||||||
|
export const ToolResultFrame = z.object({
|
||||||
|
type: z.literal('tool_result'),
|
||||||
|
tool_message_id: Uuid,
|
||||||
|
chat_id: Uuid.optional(),
|
||||||
|
tool_call_id: ToolCallId,
|
||||||
|
output: z.unknown(),
|
||||||
|
truncated: z.boolean(),
|
||||||
|
error: z.string().optional(),
|
||||||
|
});
|
||||||
|
|
||||||
|
export const MessageCompleteFrame = z.object({
|
||||||
|
type: z.literal('message_complete'),
|
||||||
|
message_id: Uuid,
|
||||||
|
chat_id: Uuid.optional(),
|
||||||
|
tokens_used: z.number().int().nonnegative().nullable().optional(),
|
||||||
|
ctx_used: z.number().int().nonnegative().nullable().optional(),
|
||||||
|
ctx_max: z.number().int().positive().nullable().optional(),
|
||||||
|
started_at: IsoTimestamp.nullable().optional(),
|
||||||
|
finished_at: IsoTimestamp.nullable().optional(),
|
||||||
|
model: z.string().optional(),
|
||||||
|
metadata: OpaqueObject.nullable().optional(),
|
||||||
|
});
|
||||||
|
|
||||||
|
export const UsageFrame = z.object({
|
||||||
|
type: z.literal('usage'),
|
||||||
|
message_id: Uuid,
|
||||||
|
chat_id: Uuid.optional(),
|
||||||
|
completion_tokens: z.number().int().nonnegative().nullable(),
|
||||||
|
ctx_used: z.number().int().nonnegative().nullable(),
|
||||||
|
ctx_max: z.number().int().positive().nullable(),
|
||||||
|
});
|
||||||
|
|
||||||
|
export const MessagesDeletedFrame = z.object({
|
||||||
|
type: z.literal('messages_deleted'),
|
||||||
|
message_ids: z.array(Uuid),
|
||||||
|
chat_id: Uuid.optional(),
|
||||||
|
});
|
||||||
|
|
||||||
|
export const ChatRenamedFrame = z.object({
|
||||||
|
type: z.literal('chat_renamed'),
|
||||||
|
chat_id: Uuid,
|
||||||
|
name: z.string(),
|
||||||
|
});
|
||||||
|
|
||||||
|
export const CompactedFrame = z.object({
|
||||||
|
type: z.literal('compacted'),
|
||||||
|
session_id: Uuid,
|
||||||
|
chat_id: Uuid,
|
||||||
|
summary_message_id: Uuid,
|
||||||
|
});
|
||||||
|
|
||||||
|
export const ErrorFrame = z.object({
|
||||||
|
type: z.literal('error'),
|
||||||
|
message_id: Uuid.optional(),
|
||||||
|
chat_id: Uuid.optional(),
|
||||||
|
error: z.string(),
|
||||||
|
reason: ErrorReasonValue.optional(),
|
||||||
|
});
|
||||||
|
|
||||||
|
// ---- per-user channel frames (sidebar refresh) -----------------------------
|
||||||
|
|
||||||
|
export const ChatStatusFrame = z.object({
|
||||||
|
type: z.literal('chat_status'),
|
||||||
|
chat_id: Uuid,
|
||||||
|
status: ChatStatusValue,
|
||||||
|
at: IsoTimestamp,
|
||||||
|
reason: ErrorReasonValue.optional(),
|
||||||
|
});
|
||||||
|
|
||||||
|
export const SessionUpdatedFrame = z.object({
|
||||||
|
type: z.literal('session_updated'),
|
||||||
|
session_id: Uuid,
|
||||||
|
project_id: Uuid,
|
||||||
|
name: z.string(),
|
||||||
|
updated_at: IsoTimestamp,
|
||||||
|
});
|
||||||
|
|
||||||
|
export const SessionRenamedFrame = z.object({
|
||||||
|
type: z.literal('session_renamed'),
|
||||||
|
session_id: Uuid,
|
||||||
|
name: z.string(),
|
||||||
|
});
|
||||||
|
|
||||||
|
export const SessionCreatedFrame = z.object({
|
||||||
|
type: z.literal('session_created'),
|
||||||
|
session: OpaqueObject,
|
||||||
|
project_id: Uuid,
|
||||||
|
});
|
||||||
|
|
||||||
|
export const SessionArchivedFrame = z.object({
|
||||||
|
type: z.literal('session_archived'),
|
||||||
|
session_id: Uuid,
|
||||||
|
project_id: Uuid,
|
||||||
|
});
|
||||||
|
|
||||||
|
export const SessionDeletedFrame = z.object({
|
||||||
|
type: z.literal('session_deleted'),
|
||||||
|
session_id: Uuid,
|
||||||
|
project_id: Uuid,
|
||||||
|
});
|
||||||
|
|
||||||
|
export const SessionWorkspaceUpdatedFrame = z.object({
|
||||||
|
type: z.literal('session_workspace_updated'),
|
||||||
|
session_id: Uuid,
|
||||||
|
workspace_panes: z.array(OpaqueObject),
|
||||||
|
});
|
||||||
|
|
||||||
|
export const ChatCreatedFrame = z.object({
|
||||||
|
type: z.literal('chat_created'),
|
||||||
|
chat: OpaqueObject,
|
||||||
|
session_id: Uuid,
|
||||||
|
});
|
||||||
|
|
||||||
|
export const ChatUpdatedFrame = z.object({
|
||||||
|
type: z.literal('chat_updated'),
|
||||||
|
chat_id: Uuid,
|
||||||
|
session_id: Uuid,
|
||||||
|
name: z.string().nullable(),
|
||||||
|
updated_at: IsoTimestamp,
|
||||||
|
});
|
||||||
|
|
||||||
|
export const ChatArchivedFrame = z.object({
|
||||||
|
type: z.literal('chat_archived'),
|
||||||
|
chat_id: Uuid,
|
||||||
|
session_id: Uuid,
|
||||||
|
});
|
||||||
|
|
||||||
|
export const ChatUnarchivedFrame = z.object({
|
||||||
|
type: z.literal('chat_unarchived'),
|
||||||
|
chat: OpaqueObject,
|
||||||
|
});
|
||||||
|
|
||||||
|
export const ChatDeletedFrame = z.object({
|
||||||
|
type: z.literal('chat_deleted'),
|
||||||
|
chat_id: Uuid,
|
||||||
|
session_id: Uuid,
|
||||||
|
});
|
||||||
|
|
||||||
|
export const ProjectCreatedFrame = z.object({
|
||||||
|
type: z.literal('project_created'),
|
||||||
|
project: OpaqueObject,
|
||||||
|
});
|
||||||
|
|
||||||
|
export const ProjectArchivedFrame = z.object({
|
||||||
|
type: z.literal('project_archived'),
|
||||||
|
project_id: Uuid,
|
||||||
|
});
|
||||||
|
|
||||||
|
export const ProjectUnarchivedFrame = z.object({
|
||||||
|
type: z.literal('project_unarchived'),
|
||||||
|
project: OpaqueObject,
|
||||||
|
});
|
||||||
|
|
||||||
|
export const ProjectUpdatedFrame = z.object({
|
||||||
|
type: z.literal('project_updated'),
|
||||||
|
project_id: Uuid,
|
||||||
|
name: z.string(),
|
||||||
|
});
|
||||||
|
|
||||||
|
export const ProjectDeletedFrame = z.object({
|
||||||
|
type: z.literal('project_deleted'),
|
||||||
|
project_id: Uuid,
|
||||||
|
});
|
||||||
|
|
||||||
|
// ---- discriminated union ---------------------------------------------------
|
||||||
|
|
||||||
|
export const WsFrameSchema = z.discriminatedUnion('type', [
|
||||||
|
// per-session
|
||||||
|
SnapshotFrame,
|
||||||
|
MessageStartedFrame,
|
||||||
|
DeltaFrame,
|
||||||
|
ToolCallFrame,
|
||||||
|
ToolResultFrame,
|
||||||
|
MessageCompleteFrame,
|
||||||
|
UsageFrame,
|
||||||
|
MessagesDeletedFrame,
|
||||||
|
ChatRenamedFrame,
|
||||||
|
CompactedFrame,
|
||||||
|
ErrorFrame,
|
||||||
|
// per-user
|
||||||
|
ChatStatusFrame,
|
||||||
|
SessionUpdatedFrame,
|
||||||
|
SessionRenamedFrame,
|
||||||
|
SessionCreatedFrame,
|
||||||
|
SessionArchivedFrame,
|
||||||
|
SessionDeletedFrame,
|
||||||
|
SessionWorkspaceUpdatedFrame,
|
||||||
|
ChatCreatedFrame,
|
||||||
|
ChatUpdatedFrame,
|
||||||
|
ChatArchivedFrame,
|
||||||
|
ChatUnarchivedFrame,
|
||||||
|
ChatDeletedFrame,
|
||||||
|
ProjectCreatedFrame,
|
||||||
|
ProjectArchivedFrame,
|
||||||
|
ProjectUnarchivedFrame,
|
||||||
|
ProjectUpdatedFrame,
|
||||||
|
ProjectDeletedFrame,
|
||||||
|
]);
|
||||||
|
|
||||||
|
export type WsFrame = z.infer<typeof WsFrameSchema>;
|
||||||
|
|
||||||
|
// Convenience: the set of known frame types. Useful for the publishFrame
|
||||||
|
// helper to log the offending type name when validation fails. Kept in sync
|
||||||
|
// by hand with the discriminated union above.
|
||||||
|
export const KNOWN_FRAME_TYPES: readonly WsFrame['type'][] = [
|
||||||
|
'snapshot',
|
||||||
|
'message_started',
|
||||||
|
'delta',
|
||||||
|
'tool_call',
|
||||||
|
'tool_result',
|
||||||
|
'message_complete',
|
||||||
|
'usage',
|
||||||
|
'messages_deleted',
|
||||||
|
'chat_renamed',
|
||||||
|
'compacted',
|
||||||
|
'error',
|
||||||
|
'chat_status',
|
||||||
|
'session_updated',
|
||||||
|
'session_renamed',
|
||||||
|
'session_created',
|
||||||
|
'session_archived',
|
||||||
|
'session_deleted',
|
||||||
|
'session_workspace_updated',
|
||||||
|
'chat_created',
|
||||||
|
'chat_updated',
|
||||||
|
'chat_archived',
|
||||||
|
'chat_unarchived',
|
||||||
|
'chat_deleted',
|
||||||
|
'project_created',
|
||||||
|
'project_archived',
|
||||||
|
'project_unarchived',
|
||||||
|
'project_updated',
|
||||||
|
'project_deleted',
|
||||||
|
] as const;
|
||||||
@@ -1,8 +1,8 @@
|
|||||||
import { useEffect, useState } from 'react';
|
import { useEffect, useMemo, useState } from 'react';
|
||||||
import { Check, ChevronDown } from 'lucide-react';
|
import { Check, ChevronDown } from 'lucide-react';
|
||||||
import { toast } from 'sonner';
|
import { toast } from 'sonner';
|
||||||
import { api } from '@/api/client';
|
import { api } from '@/api/client';
|
||||||
import type { Agent, AgentParseError } from '@/api/types';
|
import type { Agent, AgentParseError, ToolCostStat } from '@/api/types';
|
||||||
import {
|
import {
|
||||||
DropdownMenu,
|
DropdownMenu,
|
||||||
DropdownMenuContent,
|
DropdownMenuContent,
|
||||||
@@ -22,6 +22,10 @@ export function AgentPicker({ projectId, value, onChange }: Props) {
|
|||||||
const [parseErrors, setParseErrors] = useState<AgentParseError[]>([]);
|
const [parseErrors, setParseErrors] = useState<AgentParseError[]>([]);
|
||||||
const [error, setError] = useState<string | null>(null);
|
const [error, setError] = useState<string | null>(null);
|
||||||
const [open, setOpen] = useState(false);
|
const [open, setOpen] = useState(false);
|
||||||
|
// v1.13.10: per-tool cost rolling window. Fetched once on mount; would
|
||||||
|
// refresh on remount or page reload. Acceptable for a decision aid — the
|
||||||
|
// 100-call rolling mean doesn't shift fast.
|
||||||
|
const [costStats, setCostStats] = useState<ToolCostStat[]>([]);
|
||||||
|
|
||||||
// v1.8.1: per-agent parse errors are non-blocking. Silent if any agents
|
// v1.8.1: per-agent parse errors are non-blocking. Silent if any agents
|
||||||
// loaded successfully; a gray warning toast fires only when EVERY agent
|
// loaded successfully; a gray warning toast fires only when EVERY agent
|
||||||
@@ -52,6 +56,29 @@ export function AgentPicker({ projectId, value, onChange }: Props) {
|
|||||||
};
|
};
|
||||||
}, [projectId]);
|
}, [projectId]);
|
||||||
|
|
||||||
|
// v1.13.10: cost stats are project-independent — the 100-call rolling
|
||||||
|
// window is global across all chats. Fetch once per mount; tolerate failure
|
||||||
|
// silently (cost line hides).
|
||||||
|
useEffect(() => {
|
||||||
|
let cancelled = false;
|
||||||
|
api.tools
|
||||||
|
.costStats()
|
||||||
|
.then((r) => {
|
||||||
|
if (!cancelled) setCostStats(r.stats);
|
||||||
|
})
|
||||||
|
.catch(() => {
|
||||||
|
if (!cancelled) setCostStats([]);
|
||||||
|
});
|
||||||
|
return () => {
|
||||||
|
cancelled = true;
|
||||||
|
};
|
||||||
|
}, []);
|
||||||
|
|
||||||
|
const costByTool = useMemo(
|
||||||
|
() => Object.fromEntries(costStats.map((s) => [s.tool_name, s])),
|
||||||
|
[costStats],
|
||||||
|
);
|
||||||
|
|
||||||
const selectedAgent = agents?.find((a) => a.id === value) ?? null;
|
const selectedAgent = agents?.find((a) => a.id === value) ?? null;
|
||||||
const triggerLabel = value === null
|
const triggerLabel = value === null
|
||||||
? 'No agent'
|
? 'No agent'
|
||||||
@@ -86,7 +113,9 @@ export function AgentPicker({ projectId, value, onChange }: Props) {
|
|||||||
<span className="font-medium">No agent</span>
|
<span className="font-medium">No agent</span>
|
||||||
</DropdownMenuItem>
|
</DropdownMenuItem>
|
||||||
{agents.length > 0 && <DropdownMenuSeparator />}
|
{agents.length > 0 && <DropdownMenuSeparator />}
|
||||||
{agents.map((a) => (
|
{agents.map((a) => {
|
||||||
|
const cost = agentCost(a, costByTool);
|
||||||
|
return (
|
||||||
<DropdownMenuItem
|
<DropdownMenuItem
|
||||||
key={a.id}
|
key={a.id}
|
||||||
onSelect={() => void onChange(a.id)}
|
onSelect={() => void onChange(a.id)}
|
||||||
@@ -103,8 +132,14 @@ export function AgentPicker({ projectId, value, onChange }: Props) {
|
|||||||
{a.description}
|
{a.description}
|
||||||
</span>
|
</span>
|
||||||
)}
|
)}
|
||||||
|
{cost.nWithData > 0 && (
|
||||||
|
<span className="text-muted-foreground/70 pl-[18px] truncate w-full">
|
||||||
|
~{formatK(cost.prompt)} prompt / {cost.completion} completion · {cost.nWithData}/{cost.nTools} tools{cost.mostRecent ? ` · last call ${formatAgo(cost.mostRecent)}` : ''}
|
||||||
|
</span>
|
||||||
|
)}
|
||||||
</DropdownMenuItem>
|
</DropdownMenuItem>
|
||||||
))}
|
);
|
||||||
|
})}
|
||||||
{parseErrors.length > 0 && (
|
{parseErrors.length > 0 && (
|
||||||
<div
|
<div
|
||||||
className="px-2 py-1.5 mt-1 text-xs text-amber-500 border-t border-border"
|
className="px-2 py-1.5 mt-1 text-xs text-amber-500 border-t border-border"
|
||||||
@@ -119,3 +154,49 @@ export function AgentPicker({ projectId, value, onChange }: Props) {
|
|||||||
</DropdownMenu>
|
</DropdownMenu>
|
||||||
);
|
);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// v1.13.10: sum the per-tool means across an agent's whitelisted tools.
|
||||||
|
// Sum-of-means, not mean-of-sums — we're combining independent rolling
|
||||||
|
// averages. nWithData reflects how many of the agent's tools have any
|
||||||
|
// history yet; the line hides entirely when zero so a fresh deploy doesn't
|
||||||
|
// render "0k / 0 / 0 tools".
|
||||||
|
function agentCost(
|
||||||
|
agent: Agent,
|
||||||
|
costByTool: Record<string, ToolCostStat>,
|
||||||
|
): {
|
||||||
|
prompt: number;
|
||||||
|
completion: number;
|
||||||
|
nTools: number;
|
||||||
|
nWithData: number;
|
||||||
|
mostRecent: string | null;
|
||||||
|
} {
|
||||||
|
let prompt = 0;
|
||||||
|
let completion = 0;
|
||||||
|
let nWithData = 0;
|
||||||
|
let mostRecent: string | null = null;
|
||||||
|
for (const t of agent.tools) {
|
||||||
|
const s = costByTool[t];
|
||||||
|
if (!s) continue;
|
||||||
|
prompt += s.mean_prompt_tokens;
|
||||||
|
completion += s.mean_completion_tokens;
|
||||||
|
nWithData++;
|
||||||
|
if (!mostRecent || s.updated_at > mostRecent) mostRecent = s.updated_at;
|
||||||
|
}
|
||||||
|
return { prompt, completion, nTools: agent.tools.length, nWithData, mostRecent };
|
||||||
|
}
|
||||||
|
|
||||||
|
function formatK(n: number): string {
|
||||||
|
if (n < 1000) return String(n);
|
||||||
|
if (n < 10_000) return `${(n / 1000).toFixed(1)}k`;
|
||||||
|
return `${Math.round(n / 1000)}k`;
|
||||||
|
}
|
||||||
|
|
||||||
|
function formatAgo(iso: string): string {
|
||||||
|
const then = new Date(iso).getTime();
|
||||||
|
if (Number.isNaN(then)) return '—';
|
||||||
|
const diff = Date.now() - then;
|
||||||
|
if (diff < 60_000) return 'just now';
|
||||||
|
if (diff < 3_600_000) return `${Math.round(diff / 60_000)}m ago`;
|
||||||
|
if (diff < 86_400_000) return `${Math.round(diff / 3_600_000)}h ago`;
|
||||||
|
return `${Math.round(diff / 86_400_000)}d ago`;
|
||||||
|
}
|
||||||
|
|||||||
@@ -1,6 +1,7 @@
|
|||||||
import { useEffect, useRef, useState } from 'react';
|
import { useEffect, useRef, useState } from 'react';
|
||||||
import { toast } from 'sonner';
|
import { toast } from 'sonner';
|
||||||
import type { Message, WsFrame } from '@/api/types';
|
import type { Message, WsFrame } from '@/api/types';
|
||||||
|
import { WsFrameSchema } from '@/api/ws-frames';
|
||||||
import { api } from '@/api/client';
|
import { api } from '@/api/client';
|
||||||
import { sessionEvents } from './sessionEvents';
|
import { sessionEvents } from './sessionEvents';
|
||||||
import { recordUsage } from './useChatThroughput';
|
import { recordUsage } from './useChatThroughput';
|
||||||
@@ -216,8 +217,28 @@ export function useSessionStream(sessionId: string | undefined) {
|
|||||||
setState((s) => ({ ...s, connected: true, error: null }));
|
setState((s) => ({ ...s, connected: true, error: null }));
|
||||||
};
|
};
|
||||||
ws.onmessage = (ev) => {
|
ws.onmessage = (ev) => {
|
||||||
|
// v1.13.11-a: Zod-validate every inbound frame. Fail-closed — invalid
|
||||||
|
// frames are logged and dropped. WsFrameSchema is the runtime guard;
|
||||||
|
// the hand-maintained WsFrame type stays as the narrowed dev-time
|
||||||
|
// shape (Zod uses OpaqueObject for nested types like Message[]). One
|
||||||
|
// cast bridges the two.
|
||||||
|
let raw: unknown;
|
||||||
try {
|
try {
|
||||||
const frame = JSON.parse(typeof ev.data === 'string' ? ev.data : '') as WsFrame;
|
raw = JSON.parse(typeof ev.data === 'string' ? ev.data : '');
|
||||||
|
} catch (err) {
|
||||||
|
console.warn('bad ws frame (parse)', err);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
const validated = WsFrameSchema.safeParse(raw);
|
||||||
|
if (!validated.success) {
|
||||||
|
console.error('ws-frame-validation-failed (session channel)', {
|
||||||
|
frame_type: (raw as { type?: unknown })?.type,
|
||||||
|
errors: validated.error.flatten(),
|
||||||
|
});
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
try {
|
||||||
|
const frame = validated.data as unknown as WsFrame;
|
||||||
// v1.11: on a compaction completion, re-fetch the message list so
|
// v1.11: on a compaction completion, re-fetch the message list so
|
||||||
// the new summary row + the cohort of compacted_at-stamped older
|
// the new summary row + the cohort of compacted_at-stamped older
|
||||||
// rows render correctly. We dispatch the fresh list as a synthetic
|
// rows render correctly. We dispatch the fresh list as a synthetic
|
||||||
|
|||||||
@@ -1,4 +1,5 @@
|
|||||||
import { useEffect } from 'react';
|
import { useEffect } from 'react';
|
||||||
|
import { WsFrameSchema } from '@/api/ws-frames';
|
||||||
import { sessionEvents } from './sessionEvents';
|
import { sessionEvents } from './sessionEvents';
|
||||||
import { createWsReconnectToast } from './wsReconnectToast';
|
import { createWsReconnectToast } from './wsReconnectToast';
|
||||||
|
|
||||||
@@ -38,14 +39,33 @@ export function useUserEvents(): void {
|
|||||||
};
|
};
|
||||||
|
|
||||||
ws.onmessage = (ev) => {
|
ws.onmessage = (ev) => {
|
||||||
|
// v1.13.11-a: Zod-validate every inbound frame. Fail-closed — invalid
|
||||||
|
// frames are logged and dropped instead of dispatched onto the
|
||||||
|
// sessionEvents bus where a stale or wrong shape would silently
|
||||||
|
// corrupt sidebar / chat state.
|
||||||
|
let raw: unknown;
|
||||||
try {
|
try {
|
||||||
const parsed: unknown = JSON.parse(ev.data);
|
raw = JSON.parse(ev.data);
|
||||||
if (parsed && typeof (parsed as { type?: unknown }).type === 'string') {
|
|
||||||
sessionEvents.emit(parsed as import('./sessionEvents').SessionEvent);
|
|
||||||
}
|
|
||||||
} catch (err) {
|
} catch (err) {
|
||||||
console.warn('useUserEvents: failed to parse frame', err);
|
console.warn('useUserEvents: failed to parse frame', err);
|
||||||
|
return;
|
||||||
}
|
}
|
||||||
|
const validated = WsFrameSchema.safeParse(raw);
|
||||||
|
if (!validated.success) {
|
||||||
|
console.error('ws-frame-validation-failed (user channel)', {
|
||||||
|
frame_type: (raw as { type?: unknown })?.type,
|
||||||
|
errors: validated.error.flatten(),
|
||||||
|
});
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
// Bridge cast: Zod's union is broader than SessionEvent (it includes
|
||||||
|
// per-session-channel frames too, which never arrive on the user
|
||||||
|
// channel). sessionEvents.emit only dispatches frames whose type
|
||||||
|
// appears in SessionEvent; the narrowing happens via the existing
|
||||||
|
// useSidebar.ts applyEvent switch.
|
||||||
|
sessionEvents.emit(
|
||||||
|
validated.data as unknown as import('./sessionEvents').SessionEvent,
|
||||||
|
);
|
||||||
};
|
};
|
||||||
|
|
||||||
ws.onclose = () => {
|
ws.onclose = () => {
|
||||||
|
|||||||
38
openspec/README.md
Normal file
38
openspec/README.md
Normal file
@@ -0,0 +1,38 @@
|
|||||||
|
# openspec
|
||||||
|
|
||||||
|
Per-batch documentation convention adopted v1.13.15-openspec.
|
||||||
|
|
||||||
|
Lift source: Fission-AI/OpenSpec directory layout. **No CLI dependency** — just
|
||||||
|
the folder shape. Full OpenSpec lifecycle adoption is a future v1.14+ batch.
|
||||||
|
|
||||||
|
## Layout
|
||||||
|
|
||||||
|
```
|
||||||
|
openspec/
|
||||||
|
changes/
|
||||||
|
<slug>/ # one folder per shipped or planned batch
|
||||||
|
proposal.md # Why + scope summary
|
||||||
|
tasks.md # implementation step list
|
||||||
|
design.md # architecture / data-model decisions (optional)
|
||||||
|
specs/ # reserved for future OpenSpec CLI adoption
|
||||||
|
archived/ # snapshots of pre-v1.13.15 batch docs
|
||||||
|
<original-filename>.md
|
||||||
|
specs/ # global specs, future v1.14+ use
|
||||||
|
```
|
||||||
|
|
||||||
|
## Conventions
|
||||||
|
|
||||||
|
- Slugs are lowercase-hyphenated derived from the batch title
|
||||||
|
(e.g. `v1-13-10-per-tool-cost`, `file-attachments-v3-5`).
|
||||||
|
- Already-shipped pre-v1.13.15 batches live in `changes/archived/` as
|
||||||
|
single-file snapshots. They were not split into proposal/tasks because
|
||||||
|
the work was already complete; archiving preserves git history.
|
||||||
|
- New v1.13.15+ batches should land directly in
|
||||||
|
`changes/<slug>/proposal.md` (+ tasks.md, + design.md when applicable).
|
||||||
|
- `proposal.md` carries the "Why" and scope. `tasks.md` is the action list
|
||||||
|
(numbered or checkbox). `design.md` is for non-trivial architectural
|
||||||
|
decisions worth recording separately.
|
||||||
|
- A canonical dispatch brief (matching the v1.13.9 / v1.13.10 format)
|
||||||
|
is most naturally split as proposal.md (Where we are, Why this matters,
|
||||||
|
rationale sections) + tasks.md (Scope items, Build + smoke) + design.md
|
||||||
|
(Attribution model, Filtering, Canonical mapping).
|
||||||
441
openspec/changes/archived/handoff_v1.13.10_per_tool_cost.md
Normal file
441
openspec/changes/archived/handoff_v1.13.10_per_tool_cost.md
Normal file
@@ -0,0 +1,441 @@
|
|||||||
|
```
|
||||||
|
#careful #boocode #nofluff
|
||||||
|
|
||||||
|
v1.13.10 — per-tool token cost accounting (rolling 100-call window)
|
||||||
|
|
||||||
|
Goal: surface per-tool prompt/completion-token rolling averages in AgentPicker for at-a-glance agent-cost hints. Implementation is a SQL view on top of `messages_with_parts` (no new table, no new write site) + a read endpoint + AgentPicker tooltip extension. Estimated ~240 LoC, mostly UI.
|
||||||
|
|
||||||
|
## Where we are
|
||||||
|
|
||||||
|
- Last tag: v1.13.9 (compaction overflow trigger — `floor(0.85 × ctx_max)` early-trigger). Branch clean.
|
||||||
|
- v1.13.x cleanup line ✅ through v1.13.9. Queued: v1.13.10 (this) → v1.13.11 (WS Zod) → v1.13.12 (skills audit) → v1.13.2 (column drop, last).
|
||||||
|
- Dependency (satisfied since v1.13.7 commit `ff29b48`): `includeUsage: true` on `createOpenAICompatible` in `apps/server/src/services/inference/provider.ts`. Without it, `messages.tokens_used`/`ctx_used` were NULL for v1.13.1-A → v1.13.7 (latent regression). Now populated.
|
||||||
|
|
||||||
|
## Why this matters
|
||||||
|
|
||||||
|
Today: AgentPicker lists agents by name + description. No cost signal. Users pick the architect agent (full tool whitelist, 21k of tool schema) for one-liner questions a refactorer (3 tools, 4k schema) could answer.
|
||||||
|
|
||||||
|
Tomorrow: each agent listing shows its mean prompt + completion cost per tool, derived from the last 100 invocations across all chats. Decision aid, not a hard gate.
|
||||||
|
|
||||||
|
Why a SQL view instead of a denormalized stats table:
|
||||||
|
- All the source data already lands in `messages` (tool_calls JSON + tokens_used + ctx_used) and `message_parts` (read via the `messages_with_parts` view). Zero new write sites.
|
||||||
|
- Rolling 100-call window is a `ROW_NUMBER() OVER (PARTITION BY tool_name ORDER BY created_at DESC) <= 100` — natural fit for a view.
|
||||||
|
- View is rollback-safe. If the math is wrong, `DROP VIEW` and re-deploy; no orphan rows, no backfill.
|
||||||
|
- At BooCode scale (single user, ~30 tools, ~100 calls/tool), aggregate-on-read is microseconds. Premature to denormalize.
|
||||||
|
|
||||||
|
The roadmap schema row (`tool_cost_stats (tool_name, prompt_tokens_sum, completion_tokens_sum, n_calls, updated_at)`) matches both a table and a view. View is the lighter implementation.
|
||||||
|
|
||||||
|
## Canonical column mapping (pinned)
|
||||||
|
|
||||||
|
The `messages` columns are named non-obviously. Pinned mapping, confirmed across 5 write sites + 1 read site:
|
||||||
|
|
||||||
|
| Column | Semantic meaning | AI SDK v6 source name |
|
||||||
|
|-----------------|--------------------|-----------------------|
|
||||||
|
| `ctx_used` | prompt / input tokens | `usage.inputTokens` |
|
||||||
|
| `tokens_used` | completion / output tokens | `usage.outputTokens` |
|
||||||
|
|
||||||
|
Write sites confirmed: `tool-phase.ts:94-95`, `error-handler.ts:109-110`, `sentinel-summaries.ts:130-131`, `sentinel-summaries.ts:387-388`, `stream-phase.ts:319-320`. Canonical read at `payload.ts:190-191` reverses: `const promptTokens = updated.ctx_used; const completionTokens = updated.tokens_used`.
|
||||||
|
|
||||||
|
`tokens_used` reads like "total" but is completion only. Project convention since the columns predate v1.13.x. Do not "fix" the naming inside this batch — out of scope; downstream consumers depend on the current mapping.
|
||||||
|
|
||||||
|
## Attribution model
|
||||||
|
|
||||||
|
A single assistant turn can emit N tool calls in parallel. llama-swap returns ONE (prompt_tokens, completion_tokens) per turn, not per tool. Attribution requires a split.
|
||||||
|
|
||||||
|
**Chosen approach: equal split.** For an assistant turn that emits N tool calls with prompt P and completion C, each tool is attributed P/N prompt + C/N completion. The 100-call rolling mean smooths split noise. Implementation: `tokens_used::float / jsonb_array_length(tool_calls)` at the unnest site.
|
||||||
|
|
||||||
|
**Alternatives rejected:**
|
||||||
|
- "Full turn cost to every tool" (no division). Over-states; a 5-tool turn would 5×-count every tool's cost.
|
||||||
|
- "Result-size only" (`length(JSON.stringify(output)) / 4`). Loses the LLM's actual usage signal; doesn't capture how expensive a tool's output is to the next prompt.
|
||||||
|
- "Consuming-turn delta" (next turn prompt_tokens − this turn prompt_tokens, attribute to the tool that emitted the result). Most accurate but requires bubble-back math through the `executeToolPhase → runAssistantTurn` recursion. Over-engineered for the rolling-average use case.
|
||||||
|
|
||||||
|
**If Sam wants a different split, change one line in the view definition (the divisor).**
|
||||||
|
|
||||||
|
## Filtering — sentinel, failure, repair-call semantics
|
||||||
|
|
||||||
|
The view excludes rows that aren't real tool-cost signal:
|
||||||
|
|
||||||
|
- **Failed and cancelled turns** (`status != 'complete'`). The `error-handler.ts` failed/cancelled paths don't write `tokens_used`/`ctx_used`, so the existing `tokens_used IS NOT NULL` clause already filters these. Adding `status='complete'` is defense in depth and makes intent explicit.
|
||||||
|
- **Cap-hit and doom-loop sentinel rows** (`metadata->>'kind' IN ('cap_hit', 'doom_loop')`). Sentinels are `role='system'` rows with `tool_calls=NULL`, so the existing `tool_calls IS NOT NULL` clause already filters them. The explicit metadata filter is defense in depth — it survives future schema drift where someone might INSERT a sentinel with a non-null tool_calls.
|
||||||
|
- **`experimental_repairToolCall` retries.** No special handling needed. Our impl (per `CLAUDE.md`) is pass-through — malformed calls flow to zod-reject → tool_result error → next normal turn handles. No separate rows; the next turn's tokens count naturally.
|
||||||
|
|
||||||
|
## Recon (already done; paste for reference)
|
||||||
|
|
||||||
|
```
|
||||||
|
cd /opt/boocode
|
||||||
|
grep -n "tokens_used\|ctx_used\|inputTokens\|outputTokens" apps/server/src/services/inference/*.ts | head -30
|
||||||
|
grep -n "metadata\|cap_hit\|doom_loop" apps/server/src/services/inference/sentinels.ts apps/server/src/schema.sql | head -10
|
||||||
|
psql -h localhost -p 5432 -U postgres -d boocode -c "\d messages_with_parts" | head -30
|
||||||
|
```
|
||||||
|
|
||||||
|
Expected: confirms the canonical mapping in the table above; confirms `messages.metadata jsonb` exists at `schema.sql:259`; confirms `messages_with_parts` exposes `m.metadata` at `schema.sql:92`.
|
||||||
|
|
||||||
|
## Scope
|
||||||
|
|
||||||
|
### 1. schema.sql — `tool_cost_stats` view (~35 LoC)
|
||||||
|
|
||||||
|
Append after the `messages_with_parts` view (after line 120):
|
||||||
|
|
||||||
|
```sql
|
||||||
|
-- v1.13.10: per-tool token cost rolling window. Derives from
|
||||||
|
-- messages_with_parts (the v1.13.1-B view that COALESCEs message_parts over
|
||||||
|
-- the legacy JSON column) so this works whether the chat predates v1.13.0
|
||||||
|
-- or postdates v1.13.2 (column drop). No new write site — all source data
|
||||||
|
-- already lands via the existing tool-phase.ts:94-95 UPDATE.
|
||||||
|
--
|
||||||
|
-- Attribution model: equal split. A turn emitting N tool calls divides its
|
||||||
|
-- prompt/completion tokens by N before attribution. See v1.13.10 dispatch
|
||||||
|
-- brief for rationale + rejected alternatives.
|
||||||
|
--
|
||||||
|
-- Column mapping: messages.ctx_used = prompt (input), messages.tokens_used
|
||||||
|
-- = completion (output). Non-obvious naming; pinned via canonical writes at
|
||||||
|
-- tool-phase.ts:94-95 et al.
|
||||||
|
--
|
||||||
|
-- Filtering rationale:
|
||||||
|
-- status='complete' — exclude failed/cancelled (defense in
|
||||||
|
-- depth; failed-path doesn't write
|
||||||
|
-- tokens_used so they're also filtered
|
||||||
|
-- indirectly).
|
||||||
|
-- metadata->>'kind' exclusions — exclude cap_hit / doom_loop sentinels
|
||||||
|
-- (defense in depth; sentinels are
|
||||||
|
-- role='system' with tool_calls=NULL
|
||||||
|
-- so they're filtered indirectly too).
|
||||||
|
-- experimental_repairToolCall — no special handling; retries flow
|
||||||
|
-- as normal next-turn tool_result
|
||||||
|
-- errors and count naturally.
|
||||||
|
--
|
||||||
|
-- Rolling window: last 100 calls per tool_name, ordered by created_at DESC.
|
||||||
|
-- Aggregate-on-read is microseconds at BooCode scale (single user, ~30
|
||||||
|
-- tools, < 100 calls each). DROP VIEW + recreate to change window size.
|
||||||
|
CREATE OR REPLACE VIEW tool_cost_stats AS
|
||||||
|
WITH per_call AS (
|
||||||
|
SELECT
|
||||||
|
(tc->>'name')::text AS tool_name,
|
||||||
|
(m.ctx_used::float / NULLIF(jsonb_array_length(m.tool_calls), 0)) AS prompt_tokens,
|
||||||
|
(m.tokens_used::float / NULLIF(jsonb_array_length(m.tool_calls), 0)) AS completion_tokens,
|
||||||
|
m.created_at,
|
||||||
|
ROW_NUMBER() OVER (
|
||||||
|
PARTITION BY (tc->>'name')::text
|
||||||
|
ORDER BY m.created_at DESC
|
||||||
|
) AS rn
|
||||||
|
FROM messages_with_parts m,
|
||||||
|
LATERAL jsonb_array_elements(m.tool_calls) AS tc
|
||||||
|
WHERE m.tool_calls IS NOT NULL
|
||||||
|
AND jsonb_array_length(m.tool_calls) > 0
|
||||||
|
AND m.tokens_used IS NOT NULL
|
||||||
|
AND m.ctx_used IS NOT NULL
|
||||||
|
AND m.status = 'complete'
|
||||||
|
AND (m.metadata IS NULL
|
||||||
|
OR m.metadata->>'kind' IS NULL
|
||||||
|
OR m.metadata->>'kind' NOT IN ('cap_hit', 'doom_loop'))
|
||||||
|
)
|
||||||
|
SELECT
|
||||||
|
tool_name,
|
||||||
|
ROUND(SUM(prompt_tokens))::int AS prompt_tokens_sum,
|
||||||
|
ROUND(SUM(completion_tokens))::int AS completion_tokens_sum,
|
||||||
|
COUNT(*)::int AS n_calls,
|
||||||
|
MAX(created_at) AS updated_at
|
||||||
|
FROM per_call
|
||||||
|
WHERE rn <= 100
|
||||||
|
GROUP BY tool_name;
|
||||||
|
```
|
||||||
|
|
||||||
|
Notes:
|
||||||
|
- `NULLIF(..., 0)` guards against div-by-zero on `jsonb_array_length=0` (should never happen given the WHERE clause, but defensive).
|
||||||
|
- `ROUND(SUM(...))::int` — frontend doesn't want decimals; sum-then-round is more accurate than per-row round-then-sum.
|
||||||
|
- View is read from `messages_with_parts` not `messages`, so legacy pre-v1.13.0 rows and post-v1.13.2 rows both resolve.
|
||||||
|
- No index needed; the underlying `idx_messages_chat` covers the JOIN; the LATERAL unnest is bounded by the 100-row partition.
|
||||||
|
|
||||||
|
### 2. apps/server/src/routes/tools.ts (NEW, ~40 LoC)
|
||||||
|
|
||||||
|
New route file. Register in `apps/server/src/index.ts` next to the other `register*Routes(app, sql, ...)` calls.
|
||||||
|
|
||||||
|
```ts
|
||||||
|
import type { FastifyInstance } from 'fastify';
|
||||||
|
import type { Sql } from '../db.js';
|
||||||
|
|
||||||
|
export interface ToolCostStat {
|
||||||
|
tool_name: string;
|
||||||
|
mean_prompt_tokens: number;
|
||||||
|
mean_completion_tokens: number;
|
||||||
|
n_calls: number;
|
||||||
|
updated_at: string;
|
||||||
|
}
|
||||||
|
|
||||||
|
export function registerToolsRoutes(app: FastifyInstance, sql: Sql) {
|
||||||
|
app.get('/api/tools/cost_stats', async () => {
|
||||||
|
const rows = await sql<{
|
||||||
|
tool_name: string;
|
||||||
|
prompt_tokens_sum: number;
|
||||||
|
completion_tokens_sum: number;
|
||||||
|
n_calls: number;
|
||||||
|
updated_at: string;
|
||||||
|
}[]>`
|
||||||
|
SELECT tool_name, prompt_tokens_sum, completion_tokens_sum, n_calls, updated_at
|
||||||
|
FROM tool_cost_stats
|
||||||
|
ORDER BY tool_name ASC
|
||||||
|
`;
|
||||||
|
const stats: ToolCostStat[] = rows.map(r => ({
|
||||||
|
tool_name: r.tool_name,
|
||||||
|
mean_prompt_tokens: Math.round(r.prompt_tokens_sum / r.n_calls),
|
||||||
|
mean_completion_tokens: Math.round(r.completion_tokens_sum / r.n_calls),
|
||||||
|
n_calls: r.n_calls,
|
||||||
|
updated_at: r.updated_at,
|
||||||
|
}));
|
||||||
|
return { stats };
|
||||||
|
});
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Route is bodyless, idempotent, cheap. No pagination (≤30 tools).
|
||||||
|
|
||||||
|
### 3. apps/server/src/services/__tests__/tool_cost_stats.test.ts (NEW, ~95 LoC)
|
||||||
|
|
||||||
|
Integration test against real Postgres (matches `inference.test.ts` pattern). Fixtures:
|
||||||
|
|
||||||
|
```ts
|
||||||
|
import { describe, it, expect, beforeEach } from 'vitest';
|
||||||
|
import { connect } from '../../db.js';
|
||||||
|
|
||||||
|
describe('tool_cost_stats view (v1.13.10)', () => {
|
||||||
|
// ... session + chat + project setup helpers ...
|
||||||
|
|
||||||
|
it('returns empty when no tool calls exist', async () => {
|
||||||
|
// fresh chat, only user/assistant text turns
|
||||||
|
const stats = await sql`SELECT * FROM tool_cost_stats`;
|
||||||
|
expect(stats).toEqual([]);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('attributes single-tool turn fully to that tool', async () => {
|
||||||
|
// insert one assistant message with tool_calls=[{name: 'view_file', ...}],
|
||||||
|
// tokens_used=300, ctx_used=15000, status='complete'
|
||||||
|
const stats = await sql`SELECT * FROM tool_cost_stats WHERE tool_name='view_file'`;
|
||||||
|
expect(stats[0]).toMatchObject({
|
||||||
|
tool_name: 'view_file',
|
||||||
|
prompt_tokens_sum: 15000,
|
||||||
|
completion_tokens_sum: 300,
|
||||||
|
n_calls: 1,
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
it('splits multi-tool turn equally across tools', async () => {
|
||||||
|
// insert one assistant turn with 3 tool calls (view_file, grep, list_dir),
|
||||||
|
// tokens_used=300, ctx_used=15000 → each tool gets 100 completion, 5000 prompt
|
||||||
|
const stats = await sql`SELECT * FROM tool_cost_stats ORDER BY tool_name`;
|
||||||
|
expect(stats).toHaveLength(3);
|
||||||
|
for (const s of stats) {
|
||||||
|
expect(s.completion_tokens_sum).toBe(100);
|
||||||
|
expect(s.prompt_tokens_sum).toBe(5000);
|
||||||
|
expect(s.n_calls).toBe(1);
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
it('limits to last 100 calls per tool (FIFO window)', async () => {
|
||||||
|
// insert 150 turns each calling view_file once with monotonically
|
||||||
|
// increasing tokens_used; expect only the most recent 100 to count
|
||||||
|
const stats = await sql`SELECT * FROM tool_cost_stats WHERE tool_name='view_file'`;
|
||||||
|
expect(stats[0]!.n_calls).toBe(100);
|
||||||
|
// mean should reflect the latter half (51..150), not 1..150
|
||||||
|
});
|
||||||
|
|
||||||
|
it('excludes turns with NULL tokens_used (pre-v1.13.7 latent regression)', async () => {
|
||||||
|
// insert a turn with tool_calls but tokens_used=NULL → must not appear
|
||||||
|
const stats = await sql`SELECT * FROM tool_cost_stats WHERE tool_name='view_file'`;
|
||||||
|
expect(stats).toEqual([]);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('excludes failed and cancelled turns + sentinel metadata rows', async () => {
|
||||||
|
// insert four rows for tool_name='view_file', all with tokens_used+ctx_used
|
||||||
|
// populated:
|
||||||
|
// row A: status='failed' — excluded
|
||||||
|
// row B: status='cancelled' — excluded
|
||||||
|
// row C: status='complete', metadata={kind:'cap_hit'} — excluded
|
||||||
|
// row D: status='complete', metadata={kind:'doom_loop'} — excluded
|
||||||
|
// row E: status='complete', metadata=null — included
|
||||||
|
// Expect n_calls=1, attributable to row E only.
|
||||||
|
const stats = await sql`SELECT * FROM tool_cost_stats WHERE tool_name='view_file'`;
|
||||||
|
expect(stats[0]!.n_calls).toBe(1);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('reads tool_calls via messages_with_parts (parts-authoritative)', async () => {
|
||||||
|
// insert a v1.13.0+ row with messages.tool_calls=NULL but
|
||||||
|
// message_parts rows containing the tool_call → must still aggregate
|
||||||
|
const stats = await sql`SELECT * FROM tool_cost_stats WHERE tool_name='grep'`;
|
||||||
|
expect(stats[0]!.n_calls).toBe(1);
|
||||||
|
});
|
||||||
|
});
|
||||||
|
```
|
||||||
|
|
||||||
|
Pattern: each test resets the messages table for the fixture chat (TRUNCATE not DELETE — Postgres `messages` has FK CASCADE) and inserts hand-crafted rows. The view is recomputed on every SELECT.
|
||||||
|
|
||||||
|
### 4. apps/web/src/api/types.ts + client.ts (~10 LoC)
|
||||||
|
|
||||||
|
Add to `types.ts`:
|
||||||
|
|
||||||
|
```ts
|
||||||
|
export interface ToolCostStat {
|
||||||
|
tool_name: string;
|
||||||
|
mean_prompt_tokens: number;
|
||||||
|
mean_completion_tokens: number;
|
||||||
|
n_calls: number;
|
||||||
|
updated_at: string;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Add to `client.ts` under the existing `api.*` namespace structure:
|
||||||
|
|
||||||
|
```ts
|
||||||
|
tools: {
|
||||||
|
costStats: () => fetch<{ stats: ToolCostStat[] }>('GET', '/api/tools/cost_stats'),
|
||||||
|
},
|
||||||
|
```
|
||||||
|
|
||||||
|
Match the casing convention of the existing namespaces (`api.agents.list`, `api.chats.archive`, etc.).
|
||||||
|
|
||||||
|
### 5. apps/web/src/components/AgentPicker.tsx — tooltip extension (~80 LoC delta)
|
||||||
|
|
||||||
|
Currently (line 67): `title={selectedAgent?.description}` — native HTML title attribute on the trigger button.
|
||||||
|
|
||||||
|
Replacement: dropdown items get a per-agent cost line in muted text below the description. Format:
|
||||||
|
|
||||||
|
```
|
||||||
|
[Agent name]
|
||||||
|
[Agent description]
|
||||||
|
~5.2k prompt / 280 completion · 6 tools · last call 3h ago
|
||||||
|
```
|
||||||
|
|
||||||
|
Implementation steps:
|
||||||
|
1. Fetch `api.tools.costStats()` once on mount (alongside the existing `api.agents.list()`). Cache result for the lifetime of the picker open state. Re-fetch only on `useEffect` dep change.
|
||||||
|
2. Compute per-agent aggregate: for each agent, sum the means of its whitelisted tools. Sum-of-means, not mean-of-sums — we're combining independent rolling averages.
|
||||||
|
3. Render below description (one line, muted, truncated). Show "—" if no calls recorded yet for any of the agent's tools.
|
||||||
|
4. Don't break the existing native `title=` for backward compat; layer the cost line additively.
|
||||||
|
|
||||||
|
```tsx
|
||||||
|
const [costStats, setCostStats] = useState<ToolCostStat[]>([]);
|
||||||
|
useEffect(() => {
|
||||||
|
api.tools.costStats().then(r => setCostStats(r.stats)).catch(() => setCostStats([]));
|
||||||
|
}, []);
|
||||||
|
const costByTool = useMemo(
|
||||||
|
() => Object.fromEntries(costStats.map(s => [s.tool_name, s])),
|
||||||
|
[costStats],
|
||||||
|
);
|
||||||
|
function agentCost(agent: Agent): { prompt: number; completion: number; nTools: number; nWithData: number; mostRecent: string | null } {
|
||||||
|
let prompt = 0, completion = 0, nWithData = 0;
|
||||||
|
let mostRecent: string | null = null;
|
||||||
|
for (const t of agent.tools) {
|
||||||
|
const s = costByTool[t];
|
||||||
|
if (!s) continue;
|
||||||
|
prompt += s.mean_prompt_tokens;
|
||||||
|
completion += s.mean_completion_tokens;
|
||||||
|
nWithData++;
|
||||||
|
if (!mostRecent || s.updated_at > mostRecent) mostRecent = s.updated_at;
|
||||||
|
}
|
||||||
|
return { prompt, completion, nTools: agent.tools.length, nWithData, mostRecent };
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
For the line render: `~${formatK(prompt)} prompt / ${completion} completion · ${nWithData}/${nTools} tools · ${formatAgo(mostRecent)}`. Skip entirely when `nWithData === 0` to avoid showing "0k / 0 / 0 tools" for fresh-from-deploy state.
|
||||||
|
|
||||||
|
**`formatK` / `formatAgo`:** colocate at the bottom of `AgentPicker.tsx`. Don't extract to a util file in this batch — single use site.
|
||||||
|
|
||||||
|
## What NOT to do
|
||||||
|
|
||||||
|
- **Don't add a new write site at `tool-phase.ts` or `finalizeCompletion`.** All source data is already there via existing UPDATEs.
|
||||||
|
- **Don't denormalize.** The view is sufficient and rollback-safe at BooCode's single-user scale.
|
||||||
|
- **Don't add per-tool cost to the message bubble.** Out of scope. AgentPicker tooltip only.
|
||||||
|
- **Don't fold per-call rows into a moving sum via triggers.** Aggregate on read; 100 rows × 30 tools is microseconds in Postgres.
|
||||||
|
- **Don't track `result_chars` (the size of `tool_results.output`).** Tempting as a second cost signal but out of scope here. Future batch if Sam wants it.
|
||||||
|
- **Don't add a session-scoped or chat-scoped filter to `tool_cost_stats`.** The rolling window is GLOBAL across all chats — the agent picker is a project-level decision aid. Per-chat surfacing is a future v1.14+ design.
|
||||||
|
- **Don't change the attribution model post-deployment** without dropping the view first. Mid-flight semantic changes give bogus historical means.
|
||||||
|
- **Don't "fix" the `ctx_used`/`tokens_used` naming inside this batch.** Non-obvious but pinned across 5 write sites. Renaming is its own batch.
|
||||||
|
- **Don't rely solely on `tool_calls IS NOT NULL` for sentinel exclusion.** It works today (sentinels are role='system' with tool_calls=NULL) but the explicit `status='complete'` + `metadata->>'kind'` filters are defense in depth and survive future schema drift.
|
||||||
|
|
||||||
|
## Backup before edits
|
||||||
|
|
||||||
|
```
|
||||||
|
cd /opt/boocode
|
||||||
|
cp apps/server/src/schema.sql{,.bak-$(date +%Y%m%d-%H%M%S)}
|
||||||
|
cp apps/web/src/components/AgentPicker.tsx{,.bak-$(date +%Y%m%d-%H%M%S)}
|
||||||
|
```
|
||||||
|
|
||||||
|
(No backup needed for new files in items 2, 3, 4.)
|
||||||
|
|
||||||
|
## Verify
|
||||||
|
|
||||||
|
```
|
||||||
|
pnpm -C apps/server test
|
||||||
|
```
|
||||||
|
|
||||||
|
Expected: all existing tests pass + 7 new in `tool_cost_stats.test.ts`. Total moves from 195 → 202.
|
||||||
|
|
||||||
|
```
|
||||||
|
cd /opt/boocode
|
||||||
|
docker compose exec boocode_db psql -U postgres -d boocode -c \
|
||||||
|
"SELECT * FROM tool_cost_stats ORDER BY n_calls DESC LIMIT 10;"
|
||||||
|
```
|
||||||
|
|
||||||
|
Expected: in any live deployment with v1.13.7+ history, this returns real rows for `view_file`, `grep`, `list_dir`, etc. If empty: `messages.tool_calls` was NULL for the v1.13.1-A → v1.13.7 latent regression window and recovery only begins with v1.13.7+ traffic.
|
||||||
|
|
||||||
|
## Build + smoke
|
||||||
|
|
||||||
|
```
|
||||||
|
cd /opt/boocode
|
||||||
|
docker compose up --build -d boocode
|
||||||
|
docker compose logs --since=30s boocode | tail -20
|
||||||
|
```
|
||||||
|
|
||||||
|
Smoke A — view recompiles on schema apply:
|
||||||
|
```
|
||||||
|
docker compose logs boocode | grep -i "tool_cost_stats\|applySchema"
|
||||||
|
```
|
||||||
|
Expected: clean schema apply, view registered idempotently.
|
||||||
|
|
||||||
|
Smoke B — endpoint returns data:
|
||||||
|
```
|
||||||
|
curl -s http://localhost:3000/api/tools/cost_stats | jq '.stats | length, .stats[0]'
|
||||||
|
```
|
||||||
|
Expected: nonzero length if any v1.13.7+ tool calls exist; one stat object with all 5 fields populated.
|
||||||
|
|
||||||
|
Smoke C — UI:
|
||||||
|
1. Open browser to `boocode.indifferentketchup.com`.
|
||||||
|
2. Open AgentPicker dropdown on any session.
|
||||||
|
3. Each agent row shows a muted cost line below its description: `~5.2k prompt / 280 completion · 6/8 tools · last call 2h ago`.
|
||||||
|
4. Agents with no tool history show just description (no cost line).
|
||||||
|
5. Confirm cost line truncates with the existing text-muted-foreground / truncate pattern; doesn't break the layout at mobile widths (open Vivaldi devtools, set iPhone-13 viewport).
|
||||||
|
|
||||||
|
## Files expected to touch
|
||||||
|
|
||||||
|
- `apps/server/src/schema.sql` — ~35 LoC delta (view definition + filter comments)
|
||||||
|
- `apps/server/src/routes/tools.ts` — NEW, ~40 LoC
|
||||||
|
- `apps/server/src/index.ts` — 1 line (`registerToolsRoutes(app, sql)`)
|
||||||
|
- `apps/server/src/services/__tests__/tool_cost_stats.test.ts` — NEW, ~95 LoC
|
||||||
|
- `apps/web/src/api/types.ts` — ~7 LoC (interface)
|
||||||
|
- `apps/web/src/api/client.ts` — ~3 LoC (namespace + method)
|
||||||
|
- `apps/web/src/components/AgentPicker.tsx` — ~80 LoC delta (cost line + fetch hook + helpers)
|
||||||
|
|
||||||
|
Total ~260 LoC. Matches roadmap estimate.
|
||||||
|
|
||||||
|
## Workflow conventions
|
||||||
|
|
||||||
|
- Backups before destructive edits (above) on the two MODIFIED files. New files don't need backups.
|
||||||
|
- Sam reviews diffs. Never `git add` / `git commit` / `git push` / `git pull` on Sam's behalf.
|
||||||
|
- Build: `docker compose up --build -d boocode`. No `--no-cache` unless layer-cache trap surfaces.
|
||||||
|
- Tests authoritative: `pnpm -C apps/server test`.
|
||||||
|
- View definition lives in `schema.sql` (idempotent via `CREATE OR REPLACE VIEW`); no migration shim needed.
|
||||||
|
|
||||||
|
## Don't repeat past mistakes
|
||||||
|
|
||||||
|
- v1.13.7 stability bundle (`includeUsage:true`, trim guards, payload filter, `BUDGET_NO_AGENT=30`): all live. This batch depends on `includeUsage:true`. If unset, `tool_cost_stats` returns empty rows.
|
||||||
|
- v1.13.8 prefix instrumentation: untouched.
|
||||||
|
- v1.13.9 ratio-only `usable()`: untouched.
|
||||||
|
- v1.13.4 two-tier prune: untouched.
|
||||||
|
- v1.13.5 truncate.ts opaque-id pattern: untouched.
|
||||||
|
- v1.13.1-B `messages_with_parts` view: this view is the source. Don't reach past it to raw `messages`.
|
||||||
|
- v1.13.2 will DROP `messages.tool_calls`/`tool_results` columns. The `tool_cost_stats` view reads from `messages_with_parts` not `messages`, so it survives. Verify after v1.13.2 ships.
|
||||||
|
|
||||||
|
## Source files to read in project knowledge
|
||||||
|
|
||||||
|
- `boocode_roadmap.md` (v1.13.10 row at line 114; schema row at line 474)
|
||||||
|
- `boocode_code_review.md` (cost-tracking design background)
|
||||||
|
- `CLAUDE.md` (project conventions; messages_with_parts invariant at L80; v1.13.7 includeUsage invariant)
|
||||||
|
```
|
||||||
3
pnpm-lock.yaml
generated
3
pnpm-lock.yaml
generated
@@ -157,6 +157,9 @@ importers:
|
|||||||
tw-animate-css:
|
tw-animate-css:
|
||||||
specifier: ^1.4.0
|
specifier: ^1.4.0
|
||||||
version: 1.4.0
|
version: 1.4.0
|
||||||
|
zod:
|
||||||
|
specifier: ^3.23.8
|
||||||
|
version: 3.25.76
|
||||||
devDependencies:
|
devDependencies:
|
||||||
'@tailwindcss/postcss':
|
'@tailwindcss/postcss':
|
||||||
specifier: ^4.3.0
|
specifier: ^4.3.0
|
||||||
|
|||||||
Reference in New Issue
Block a user