Compare commits

...

7 Commits

Author SHA1 Message Date
bc376c878d v1.13.11-b: convert raw broker.publish call sites to typed publishFrame
Second half of the WebSocket-frame-typing batch. Phase A (8b568b3)
landed the schemas + frontend receive validation + publishFrame /
publishUserFrame wrappers. This commit converts the existing publish
call sites so every server-emitted WS frame now goes through Zod
validation at the broker boundary.

Conversion strategy: change once in the inference / skills adapters in
index.ts (so ctx.publish / ctx.publishUser propagate to publishFrame /
publishUserFrame for ALL ~50 inference + auto_name call sites in one
move), then bulk-replace the ~30 direct broker.publish* call sites in
the routes + compaction.

Files touched:
- index.ts: inference + skills route adapters now call publishFrame /
  publishUserFrame internally; raw broker.publishUser('default', ...)
  call in the stale-row sweeper also converted.
- routes/projects.ts (7 sites), routes/chats.ts (9 sites),
  routes/sessions.ts (8 sites): all broker.publishUser(...) → broker.
  publishUserFrame(...).
- services/compaction.ts (3 sites): 2 publishUser, 1 publish.

Real protocol drift surfaced by Zod, fixed in the same commit:

  services/compaction.ts:442 was publishing chat_status with status:
  'working' — the v1.12.1 chat_status widening (CLAUDE.md:55) dropped
  this enum value in favor of streaming|tool_running|waiting_for_input|
  idle|error. The compaction.ts site was missed during v1.12.1; the
  frame had been published with an unknown enum value ever since (the
  frontend useChatStatus quietly ignored it). Corrected to 'streaming'
  — compaction's LLM call has the same dot-state semantic as an
  inference turn. This is exactly the class of bug v1.13.11 exists to
  catch.

Schema relaxation: OpaqueObject (the bag type for nested entities like
Project / Chat / Session / WorkspacePane embedded in WS frames) was
z.object({}).passthrough(), which Zod outputs as {} & {[k:string]:
unknown}. The strict-typed entities don't have index signatures so
TypeScript rejected them at publishFrame call sites. Relaxed to
z.unknown() — runtime validation still accepts the value, dev-time
narrowing happens via the existing hand-maintained types. Trade-off:
frame-level drift detection stays sharp; nested-payload validation
goes to follow-up work as the brief intended.

Schema audit:
  grep -rn "broker\.publish(\|broker\.publishUser(" apps/server/src \
    --include="*.ts" | grep -v "broker.ts\|__tests__\|.bak"
  → 0 results. Every server publish goes through publishFrame /
  publishUserFrame. The remaining ctx.publish / ctx.publishUser sites
  in services/inference/* + services/auto_name.ts route through the
  index.ts adapter, which calls publishFrame internally.

Tests: 219/219 pass (unchanged from v1.13.11-a; the Phase B conversion
is mechanical and doesn't add test cases).

Smoke: clean container boot, no ws-frame-validation-failed entries
under normal traffic. Sidebar list refresh + agent picker open both
pass through useUserEvents without drops.

~70 LoC across 7 files. v1.13.11 closed.
2026-05-22 15:54:00 +00:00
8b568b36d3 v1.13.11-a: WS frame schemas + frontend receive validation
First half of the WebSocket-frame-typing batch (split per recon — total
scope was ~535 LoC, larger than the roadmap's ~300 estimate, so the
server-side publish-site conversion lands separately in v1.13.11-b).

Phase A scope:

(1) apps/server/src/types/ws-frames.ts (NEW) — Zod schemas for all 27
wire-format WS frame types. Discriminated union (WsFrameSchema) plus
KNOWN_FRAME_TYPES const for diagnostic lookup. UUIDs are z.string().
uuid(); model-emitted tool_call_id stays z.string().min(1) since OpenAI-
compatible APIs emit "call_<random>" not UUID. Per-kind payload narrowing
(tool args, message_parts payloads) intentionally stays z.unknown() —
frame-level drift detection is the goal; deep payload validation is
follow-up work.

(2) apps/web/src/api/ws-frames.ts (NEW) — byte-identical mirror of the
authoritative server file. No path alias from web→server in the existing
tsconfig setup; sync-by-hand was chosen over a new packages/shared/ dir.
A ws-frames.test.ts test asserts the two files match.

(3) apps/server/src/services/broker.ts — adds publishFrame() and
publishUserFrame() methods to the Broker interface. Both validate via
WsFrameSchema and fail-closed: log + drop on invalid. createBroker now
accepts an optional FastifyBaseLogger so validation failures land in
the pino stream (with console.error fallback for unit tests). The
existing publish() / publishUser() raw methods stay legal — they get
converted to the typed variants in v1.13.11-b.

(4) apps/web/src/hooks/useSessionStream.ts + useUserEvents.ts — wrap
ws.onmessage with WsFrameSchema.safeParse. Fail-closed: invalid frames
log + return without dispatching. Hand-maintained WsFrame and
SessionEvent types stay in place; one cast bridges Zod-typed → narrowed
shape (Zod uses OpaqueObject for nested Message[] / WorkspacePane[] etc.,
which are dev-time-narrowed via the existing hand-maintained types).

(5) apps/web/package.json — adds zod ^3.23.8 as a direct dep. Was a
transitive dep via ai-sdk / postgres; promotion makes the import legal.

(6) Tests: 15 new in ws-frames.test.ts covering happy-path per major
frame type, drift-catchers (unknown type, invalid enum, non-UUID, negative
tokens), parts-authoritative read variants, the mirror-file diff check,
and four broker fail-closed scenarios. 219/219 server tests pass (was
204; +15 new).

Two recon corrections to the dispatch brief, both flagged before
implementation:

- No 'parts_appended' frame exists. The brief assumed one; the codebase
  reads parts via the messages_with_parts view after message_complete
  triggers a refetch. MessagePartSchema is therefore unused this batch.
- No 'tool_running' frame exists. The brief listed it as standalone; it
  is in fact a 'chat_status' variant ({ status: 'tool_running' }), already
  covered by ChatStatusFrame.

Smoke: clean container boot, no validation errors in the server log. Real
production frames pass validation (the schemas were derived from the
existing hand-maintained types in api/types.ts and sessionEvents.ts).

v1.13.11-b will follow immediately: convert all ~85 raw broker.publish /
ctx.publish call sites across 11 server files to publishFrame /
publishUserFrame. Mechanical edit; the wiring done here means the diff
in -b is just the call-site swaps.

~310 LoC across 9 files (4 new + 5 modified).
2026-05-22 15:48:32 +00:00
34cbecf975 v1.13.15-tools: tiered tool loading via BOOCODE_TOOLS env var
Pattern lift from eyaltoledano/claude-task-master (MIT + Commons Clause
— pattern only, no code lift). Adds BOOCODE_TOOLS env var with three
tiers:

- core (4 tools): view_file, list_dir, grep, find_files. ~2k token
  schema cost.
- standard (15 tools): core + web_search, web_fetch, git_status, all
  8 codecontext_* tools. ~10k token schema cost.
- all (default; current behavior): every tool in ALL_TOOLS (20). ~21k
  token schema cost.

The env var is a CEILING — narrows agent whitelists, never expands.
Default behavior unchanged when var is unset. resolveToolTier is
case-insensitive and falls back to 'all' on unknown values.

CORE_TOOL_NAMES + STANDARD_TOOL_NAMES validated at module load against
TOOLS_BY_NAME via two top-level for-loops that throw on the first
missing name. Module fails to import if a tier references a tool that
doesn't exist in the registry — catches typos and stale tier
definitions at boot rather than silently filtering valid tools out of
agent whitelists.

Wiring: agents.ts parseAgentBlock now reads BOOCODE_TOOLS from
process.env per parse, intersects with the agent's declared frontmatter
tools (or DEFAULT_TOOLS when frontmatter omits the field). Per-parse
read is fine — agents are re-parsed on the existing 60s cache TTL.

Tests: tools.test.ts grows from 1 to 10 tests. Covers resolveToolTier
across tiers/case/unknown values + the CORE-subset-of-STANDARD invariant
+ TOOLS_BY_NAME existence for both tier sets. 204/204 pass (was 195;
+9 new).

Deviation from the brief: the codecontext tools in the actual registry
have NO codecontext_* prefix (the brief's STANDARD list assumed it).
Used the actual names (get_codebase_overview, search_symbols, etc.).
Module-load validation would have failed boot with the prefixed names.

Smoke: with BOOCODE_TOOLS unset, agents return their full 12-tool
whitelists. With BOOCODE_TOOLS=core in .env + container restart, the
same agents narrow to 4 tools (find_files, grep, list_dir, view_file)
— intersection of declared whitelist ∩ core tier. Reverted after
confirmation.

CLAUDE.md updated with BOOCODE_TOOLS in the Environment section's
Optional list. .env.example gained a commented BOOCODE_TOOLS=all line
with the per-tier token-cost table.

~110 LoC across 5 files (4 modified + 1 test expansion). Under the
brief's ~30 LoC estimate for code; the test suite expansion drove
most of the growth.
2026-05-22 14:59:01 +00:00
5a3f357ce9 v1.13.15-openspec: reformat batch docs to OpenSpec directory structure
Adopt Fission-AI/OpenSpec's openspec/changes/<change-name>/{proposal,
specs,design,tasks}.md shape for BooCode's own batch docs. Zero-dep
documentation reformat; replaces ad-hoc boocode_batchN.md /
handoff_vN.N.N.md convention.

Existing batch docs moved into openspec/changes/archived/ via git mv
(preserves history):
- boocode_batch10.md
- handoff_v1.13.8_prefix_verify.md
- handoff_v1.13.10_per_tool_cost.md

Pre-v1.13.15 docs were NOT split into proposal/tasks/design files. The
work was already shipped; the originals are preserved as archived
snapshots. New v1.13.15+ batches land directly in
openspec/changes/<slug>/proposal.md (+ tasks.md, + design.md when
applicable) per the convention documented in openspec/README.md.

CLAUDE.md gained a one-line pointer to the convention (workflow
section). File grew from 153 → 154 lines, 27,682 → 27,925 chars; both
remain well under the AgentLint hard caps.

specs/ directory is reserved for future OpenSpec CLI adoption (v1.14+).
No CLI dep added in this batch — directory structure only. If/when the
full OpenSpec lifecycle is adopted, that lands as a separate batch.
2026-05-22 14:54:17 +00:00
fc11e8dc91 v1.13.15-agentlint: instruction-file audit against AgentLint 31-check standard
Manual audit pass against 0xmariowu/AgentLint's evidence-backed checks
(MIT, drawn from 265 versions of Anthropic's internal Claude Code
system prompt).

Findings and fixes:
- Identity sections ("You are the assistant running inside ...") removed
  from BOOCHAT.md (line 3) and BOOCODER.md (line 5). The model already
  knows where it's running; the openers were emphatic decoration.
- CLAUDE.local.md added to .gitignore (.env was already covered).
  Claude Code's Glob tool ignores .gitignore by default, which means
  any local override file was otherwise readable by any agent walking
  the workspace.
- CLAUDE.md unchanged — already passes all 10 checks. Emphasis density
  0.58/1000 words (under Anthropic's 1.4/1000 endpoint); two IMPORTANT/
  MUST references are load-bearing (tsc-noEmit footgun, v1.13.7
  includeUsage invariant); zero identity sections; zero --no-verify
  references; 27,682 chars (under the 40,000-char silent-drop limit).
  Line count (153) is over the 60-120 target band, but the brief
  explicitly forbids structural rewrites in the audit pass.

Targets not in scope:
- /opt/boocode/AGENTS.md does not exist in this repo (removed in v1.12,
  per CLAUDE.md:152). The global agent registry lives at /data/AGENTS.md
  (bind-mounted from outside the repo); can't be touched by this batch.
- No .github/workflows/ directory — SHA-pin audit (step 8) skipped.

Cumulative effect: model spends fewer tokens parsing instruction-file
ceremony in BOOCHAT/BOOCODER and receives sharper priority signal per
Anthropic's measured-evolution data. Zero code changes.
2026-05-22 14:52:37 +00:00
9ce638c916 v1.13.10: per-tool token cost accounting (rolling 100-call view)
Surfaces per-tool prompt/completion-token rolling averages in
AgentPicker for at-a-glance agent-cost hints. Implementation is a
SQL view on top of messages_with_parts plus a read endpoint and
AgentPicker tooltip extension. No new write site; all source data
already lands via the existing tool-phase.ts:94-95 / error-handler.ts:
109-110 / sentinel-summaries.ts UPDATEs that v1.13.7's includeUsage:
true fix made non-NULL.

(1) schema.sql — new tool_cost_stats view. Window-functions over
messages_with_parts.tool_calls with LATERAL jsonb_array_elements.
Attribution: equal split — multi-tool turn divides tokens N-ways;
the 100-call rolling mean absorbs split noise. Filters: status=
'complete' + metadata.kind NOT IN ('cap_hit','doom_loop') exclude
failed turns and sentinels respectively; tool_calls IS NOT NULL is
defense-in-depth since sentinels are role='system' rows. CREATE OR
REPLACE means schema apply is idempotent.

(2) routes/tools.ts NEW + index.ts wire-in. GET /api/tools/cost_stats
returns { stats: ToolCostStat[] } with mean_prompt_tokens / mean_
completion_tokens computed at read time (sum / n_calls). Sorted by
tool_name ASC. No pagination — ≤30 tools.

(3) __tests__/tool_cost_stats.test.ts NEW — 7 integration tests
keyed off DATABASE_URL env var. Tests skip gracefully when unset
(no-DB default). beforeAll applies the schema via sql.unsafe(read
FileSync(schema.sql)) for self-contained runs. Helper insertAssistant
Turn shared across cases. Covers: empty state, single-tool attribution,
multi-tool equal split, 100-call FIFO window, NULL-tokens exclusion,
parts-authoritative read via messages_with_parts, failed/sentinel
exclusion.

(4) web/api/types.ts + client.ts — ToolCostStat interface + api.tools.
costStats() method binding.

(5) AgentPicker.tsx — fetch costStats on mount, compute per-agent
sum-of-means across whitelisted tools, render muted cost line below
description: "~5.2k prompt / 280 completion · 6/8 tools · last call
3h ago". Skips line entirely when no tool history; preserves existing
native title= for layout backward-compat. formatK/formatAgo colocated.

Tests: 202/202 pass (195 prior + 7 new view-integration). Server +
web tsc clean.

Smoke: schema applied cleanly; GET /api/tools/cost_stats returns
canonical JSON; view + endpoint agree. Single-row result expected
given the v1.13.1-A → v1.13.7 NULL latent regression window; new
traffic populates organically.

Roadmap row at boocode_roadmap.md:114 plus schema row at :474 both
match. View vs table decision documented in handoff_v1.13.10_per_
tool_cost.md (rollback-safe, microsecond-fast at BooCode scale).

~270 LoC across 8 files (5 modified + 3 new).
2026-05-22 14:42:09 +00:00
8126d78b34 docs: capture v1.13.7-v1.13.9 invariants in CLAUDE.md
Five additions surfacing session-discovered constraints future Claude
sessions need:
- AI SDK v6 includeUsage:true requirement (avoids re-introducing the
  v1.13.1-A→v1.13.7 NULL-tokens regression)
- \n text-delta trim guards in MessageList/MessageBubble + payload.ts
  failed/empty-assistant skip rules (avoid undoing v1.13.7)
- 0.85 × ctx_max overflow formula (v1.13.9) replacing the stale
  ctx_max - 20k line
- New services/system-prompt.ts bullet documenting the v1.13.8
  fingerprint instrumentation surface
- New services/inference/budget.ts bullet with current BUDGET_NO_AGENT=30
  and read-only-tools rationale
2026-05-22 14:07:11 +00:00
31 changed files with 2075 additions and 85 deletions

View File

@@ -10,3 +10,12 @@ POSTGRES_PASSWORD=CHANGE_ME
# Internal Tailscale address that bypasses Authelia. Override if you
# point BooCode at a different SearXNG instance.
SEARXNG_URL=http://100.114.205.53:8888
# v1.13.15-tools: BOOCODE_TOOLS narrows the tool whitelist sent to the LLM.
# Unset (default) → all tools (~21k schema). Useful primarily for single-purpose
# sessions where the model only needs read-only filesystem access.
#
# core → view_file, list_dir, grep, find_files (~2k)
# standard → core + web_*, git_status, all 8 codecontext_* tools (~10k)
# all → every tool in ALL_TOOLS (~21k)
# BOOCODE_TOOLS=all

1
.gitignore vendored
View File

@@ -1,6 +1,7 @@
node_modules
dist
.env
CLAUDE.local.md
*.log
.DS_Store
.vite

View File

@@ -1,7 +1,5 @@
# BooChat
You are the assistant running inside BooChat — a self-hosted developer chat app.
## Capabilities
- Read-only file tools: `view_file`, `list_dir`, `grep`, `find_files`

View File

@@ -2,8 +2,6 @@
> (Stub. v2.0 implementation pending. This file documents the intended contract.)
You are the assistant running inside BooCoder — the write-capable companion to BooChat.
## Capabilities
- Everything in `BOOCHAT.md`

View File

@@ -47,10 +47,12 @@ Tests: `pnpm -C apps/server test` runs the vitest suite. No test harness on `app
Key services:
- **`services/inference/`** — Public surface re-exported via `inference/index.ts`; callers import from `./services/inference/index.js` explicitly (NodeNext doesn't honor directory-index resolution). Layout: `turn.ts` (runAssistantTurn / runInference / createInferenceRunner; exports `InferenceFrame`, `InferenceContext`, `TurnArgs`, `StreamResult`), `stream-phase.ts` (streamCompletion as a v1.13.1-A AI SDK adapter + executeStreamPhase), `provider.ts` (`upstreamModel(baseURL, modelId)` wrapping `createOpenAICompatible` against llama-swap), `tool-phase.ts` (executeToolPhase; value back-edges into turn.ts for the runAssistantTurn recursion — cycle safe because deref at call time, not module top-level), `sentinel-summaries.ts` (runCapHitSummary + runDoomLoopSummary + their sentinel inserters), `error-handler.ts` (handleAbortOrError, finalizeCompletion), `payload.ts` (buildMessagesPayload, loadContext, maybeFlagForCompaction, `OpenAiMessage`), `sentinels.ts` (`detectDoomLoop`, `DOOM_LOOP_THRESHOLD`, sentinel predicates), `budget.ts` (resolveToolBudget), `xml-parser.ts` (qwen3.6 XML tool-call fallback — KEEP, AI SDK doesn't handle inline-XML tool calls), `parts.ts` (v1.13.0 dual-write helpers: `partsFromAssistantMessage`, `partsFromToolMessage`, `insertParts`), `prune.ts` (v1.13.4 two-tier compaction; `selectPruneTargets` is the pure decision helper), `types.ts` (`StreamPhaseState`, `DB_FLUSH_INTERVAL_MS`). **`TurnArgs`** is the per-turn state envelope threaded through the `executeToolPhase → runAssistantTurn` recursion; reset in `runInference` at user-message boundary. Add new per-turn state to `TurnArgs`, not module-level closures.
- **AI SDK v6 streamCompletion adapter** (v1.13.1-A; `services/inference/stream-phase.ts`). `streamText` is the underlying call; the BooCode layer above (executeStreamPhase, finalize, dual-write) is shape-preserved via an adapter. Three gotchas the LSP/test suite won't catch:
- **AI SDK v6 streamCompletion adapter** (v1.13.1-A; `services/inference/stream-phase.ts`). `streamText` is the underlying call; the BooCode layer above (executeStreamPhase, finalize, dual-write) is shape-preserved via an adapter. Five gotchas the LSP/test suite won't catch:
- **Abort signals are swallowed.** `streamText`'s `fullStream` iterator exits cleanly when `abortSignal` fires — no throw. Post-iteration `if (signal?.aborted) throw <AbortError>` is required; without it the row finalizes as `complete` instead of `cancelled`. Comment in stream-phase.ts pins this; don't refactor it away.
- **Usage lands only at stream end** via `await result.usage` (`inputTokens` / `outputTokens` v6 names → mapped to `promptTokens` / `completionTokens` for the existing onUsage callback). Mid-stream live tok/s is gone vs v1.12.2; ChatThroughput shows a single value at stream end.
- **Tools have NO `execute` field.** BooCode dispatches tools in tool-phase.ts, not the AI SDK loop. Only `description` + `inputSchema: jsonSchema(parameters)` — surfacing tool-call parts via `fullStream` and stopping is what we want.
- **`includeUsage: true` MUST be set on `createOpenAICompatible`** in `services/inference/provider.ts`. The adapter defaults it false, omitting `stream_options.include_usage` from the request body; llama-swap then never emits the usage block and `result.usage.inputTokens/outputTokens` resolve to `undefined`. Latent regression from v1.13.1-A through v1.13.7 — every assistant row in that window has `tokens_used`/`ctx_used` NULL. Don't remove this flag during refactor.
- **Tool-call-only turns may emit a leading `\n` text-delta** as the assistant content. `MessageList.flatten`'s `hasText` and `MessageBubble`'s `hasContent` both `.trim()` before the length check — otherwise whitespace-only content renders an empty bubble + ActionRow between every tool call (v1.13.7 fix). `payload.ts:buildMessagesPayload` also skips `status='failed'` AND complete-but-empty (no content, no tool_calls) assistant rows to avoid "Cannot have 2 or more assistant messages at the end of the list" upstream rejections after cap-hit + Continue.
- **AI SDK ModelMessage conversion** (`toModelMessages` in stream-phase.ts). Tool messages need a `toolName` for `ToolResultPart` — BooCode's OpenAI-shape history doesn't carry it, so a forward-scan builds a `tool_call_id → toolName` map from prior assistant `tool_calls`. Tool outputs wrapped as `{ type: 'json' | 'text', value }` matching the v6 `ToolResultOutput` union. Assistant messages with reasoning emit a `ReasoningPart` first in the content array (v1.13.1-C).
- **`experimental_repairToolCall`** (v1.13.3) wired into `streamText` to keep the stream alive when qwen3.6 emits malformed tool args. Pass-through implementation — logs the bad call and returns it unmodified; `executeToolPhase`'s existing zod-reject error path routes it to the model on the next turn.
- **`chat_status` frame shape** (published via `broker.publishUser`) — `status: 'streaming' | 'tool_running' | 'waiting_for_input' | 'idle' | 'error'` (widened from `working|idle|error` in v1.12.1). Frontend `useChatStatus` derives `idle_warm` (<30s since idle) vs `idle_cold`. `ChatThroughput` renders inline beside `StatusDot` only when streaming or tool_running, fed by 500ms-throttled `'usage'` WS frames (`completion_tokens` + `ctx_used` + `ctx_max`). The `POST /api/chats/:id/discard_stale` endpoint exists to mark a stuck-streaming row as `failed` when the frontend's 60s no-token-activity timer (`ChatPane` content-length watcher) gives up.
@@ -58,7 +60,9 @@ Key services:
- **Periodic 60s sweeper** in `apps/server/src/index.ts` (v1.13.3 + v1.13.5). Same `setInterval` runs `sweepStaleStreaming` (marks `messages.status='streaming'` older than 5 min as `failed`, publishes `chat_status='idle'` so the UI dot drops) and `cleanupTruncations` (TTL + orphan reap of tmpfs truncation files). `app.addHook('onClose')` clears the timer. No-op when nothing to reap.
- **`services/broker.ts`** — In-memory pub/sub with two channel types: per-session (message streaming) and per-user (sidebar updates). No persistence; clients reconnect on restart.
- **`services/tools.ts`** — Tool registry (`ALL_TOOLS`, `READ_ONLY_TOOL_NAMES`, `TOOLS_BY_NAME`). Filesystem tools (view_file/list_dir/grep/find_files) go through three guard layers: `path_guard.ts` (workspace scope), `secret_guard.ts` (filename deny list), `url_guard.ts` (SSRF/private-IP block for web_fetch). v1.11.8+ web tools (`web_search`, `web_fetch`) are opt-in per chat via `session.web_search_enabled` (resolved with `project.default_web_search_enabled` fallback) and filtered out of the LLM's tool schema when false. v1.13.5 truncation: when a tool slice cuts content, `services/truncate.ts` stashes the full text on tmpfs at `BOOCODE_TRUNCATION_DIR` (default `/tmp/boocode-truncations`, 0o700) keyed by an opaque `tr_<12 base32 chars>` id, and the `view_truncated_output(id)` tool retrieves it. 5MB cap (matches `view_file`'s `MAX_FILE_BYTES`), 7-day TTL, reaped by the periodic sweeper. Tmpfs path means container restart loses retrieval — acceptable, the model usually has moved on.
- **`services/compaction.ts`** + **`services/model-context.ts`** — v1.11.0 anchored rolling summary (single `summary=true` assistant row per chat, supersedes itself on each compaction). Triggered when `chats.needs_compaction` is set after an inference turn exceeds `usable(ctx_max) = ctx_max - 20k`. **`ctx_max` comes from `model-context.getModelContext()` which fetches `${LLAMA_SWAP_URL}/upstream/<model>/props`** — NOT from `parsed.timings.n_ctx` (the stream completion's `timings` doesn't carry n_ctx; that read was dead code until v1.11.3 ripped it out). v1.13.6: `buildHeadPayload` embeds `reasoning_parts` as a `<reasoning>...</reasoning>` prose prefix on the assistant `content` (OpenAI wire shape has no structured reasoning field; the summarizer reads text). Standalone tag when content is empty (tool-call-only turn). `buildHeadPayload` + `OpenAiMessage` exported for test access — keep them exported.
- **`services/compaction.ts`** + **`services/model-context.ts`** — v1.11.0 anchored rolling summary (single `summary=true` assistant row per chat, supersedes itself on each compaction). Triggered when `chats.needs_compaction` is set after an inference turn exceeds `usable(ctx_max) = floor(0.85 × ctx_max)` (v1.13.9 opencode-pattern early trigger; was `ctx_max - 20k` pre-v1.13.9, which gave only 7.6% headroom at 262k and 0 budget for ≤20k contexts). **`ctx_max` comes from `model-context.getModelContext()` which fetches `${LLAMA_SWAP_URL}/upstream/<model>/props`** — NOT from `parsed.timings.n_ctx` (the stream completion's `timings` doesn't carry n_ctx; that read was dead code until v1.11.3 ripped it out). First inferences after a boocode boot may have `ctx_max=NULL` if llama-swap hasn't loaded the model yet; negative cache TTL is 60s, recovers on next turn. v1.13.6: `buildHeadPayload` embeds `reasoning_parts` as a `<reasoning>...</reasoning>` prose prefix on the assistant `content` (OpenAI wire shape has no structured reasoning field; the summarizer reads text). Standalone tag when content is empty (tool-call-only turn). `buildHeadPayload` + `OpenAiMessage` exported for test access — keep them exported.
- **`services/system-prompt.ts`** — `buildSystemPrompt` is the string-returning shim; `buildSystemPromptWithFingerprint` is the canonical impl returning `{prompt, fingerprint, drift}`. v1.13.8 instrumentation: SHA-256 of the assembled prefix is logged per `buildMessagesPayload` call (msg `prefix-fingerprint`, level=info); a `Map<sessionId, lastHash>` observer fires `prefix-drift` (level=warn) on hash change with a field-level `changed_inputs` diff. Smoke proved the prefix is byte-stable across turns in steady-state — the originally-planned `system_prompt_cache` DB table was dropped as redundant against the v1.12.0 input-layer mtime caches (BOOCHAT.md here + AGENTS.md global+per-project in `agents.ts:safeStat`).
- **`services/inference/budget.ts`** — tool-call budgets: `BUDGET_READ_ONLY = 30`, `BUDGET_NON_READ_ONLY = 10` (forward-looking; no write tools yet), `BUDGET_NO_AGENT = 30` (v1.13.7; was 15 — every tool in `ALL_TOOLS` is read-only today, so no-agent mode shares the read-only-agent cap). Per-agent `max_tool_calls` from AGENTS.md frontmatter overrides.
- **`messages_with_parts` view** (v1.13.1-B; `schema.sql`). Read sites that need `tool_calls` / `tool_results` / `reasoning_parts` SELECT from this view, NOT `messages` directly. `COALESCE`s parts-table rows over the legacy JSON columns, so pre-v1.13.0 history still resolves. Writes still target `messages`; the v1.13.0 dual-write into `message_parts` keeps both halves in sync. New payload-assembly code must use the view — calling `messages.tool_calls` directly will miss anything written post-v1.13.1-B if the JSON column ever drifts (and dual-write makes that easy to miss). Shapes: `tool_calls jsonb[]`, `tool_results jsonb` single object, `reasoning_parts jsonb[]` of `{text}`.
- **`services/file_ops.ts`** — Shared file operation implementations used by both inference tools and HTTP routes.
- **`services/auto_name.ts`** — Non-streaming LLM call to generate 4-word session titles after first assistant reply.
@@ -108,11 +112,12 @@ Schema CHECK migration order when renaming allowed values: (1) `ALTER TABLE ...
## Environment
Required: `DATABASE_URL`, `LLAMA_SWAP_URL`. Optional: `PORT` (3000), `HOST` (0.0.0.0), `PROJECT_ROOT_WHITELIST` (/opt, read-only scope for add-existing path resolution), `BOOTSTRAP_ROOT` (/opt/projects, writable scope for create-new-project bootstrap mkdir target — host must `mkdir -p /opt/projects` before container start), `DEFAULT_MODEL`, `LOG_LEVEL`, `SEARXNG_URL` (default `http://100.114.205.53:8888` — internal Tailscale Fathom; the public `search.indifferentketchup.com` is behind Authelia and unusable from server context).
Required: `DATABASE_URL`, `LLAMA_SWAP_URL`. Optional: `PORT` (3000), `HOST` (0.0.0.0), `PROJECT_ROOT_WHITELIST` (/opt, read-only scope for add-existing path resolution), `BOOTSTRAP_ROOT` (/opt/projects, writable scope for create-new-project bootstrap mkdir target — host must `mkdir -p /opt/projects` before container start), `DEFAULT_MODEL`, `LOG_LEVEL`, `SEARXNG_URL` (default `http://100.114.205.53:8888` — internal Tailscale Fathom; the public `search.indifferentketchup.com` is behind Authelia and unusable from server context), `BOOCODE_TOOLS` (`core` | `standard` | `all`, default `all`; v1.13.15-tools tier filter — ceiling, never expands an agent's whitelist).
## Workflow
- Sam reviews all diffs and commits manually. Do not commit unless explicitly asked.
- Per-batch docs live under `openspec/changes/<slug>/{proposal,tasks,design}.md`. Already-shipped batches are snapshots in `openspec/changes/archived/`. New batches follow the proposal+tasks shape; see `openspec/README.md` for the convention.
- Deploy: `cd /opt/boocode && docker compose up --build -d` (or `docker compose build --no-cache boocode && docker compose up -d` if you suspect a layer-cache issue).
- Git push to Gitea: `GIT_SSH_COMMAND="ssh -i /opt/boocode/secrets/boocode_gitea -o IdentitiesOnly=yes" git push origin <branch>`. The default agent identity is rejected; the in-repo deploy key (`secrets/`, gitignored) is the working one. Transient `Connection reset by peer` retries cleanly after `sleep 5`.
- Don't accumulate `.bak-*` files. Clean them up in the same batch or immediately after merge.

View File

@@ -16,6 +16,7 @@ import { registerWebSocket } from './routes/ws.js';
import { registerModelRoutes } from './routes/models.js';
import { registerAgentRoutes } from './routes/agents.js';
import { registerSkillsRoutes } from './routes/skills.js';
import { registerToolsRoutes } from './routes/tools.js';
import { createInferenceRunner } from './services/inference/index.js';
import { createBroker } from './services/broker.js';
import { listSkills } from './services/skills.js';
@@ -74,7 +75,7 @@ async function main() {
return { status: dbOk ? 'ok' : 'degraded', db: dbOk };
});
const broker = createBroker();
const broker = createBroker(app.log);
registerProjectRoutes(app, sql, config, broker);
registerSessionRoutes(app, sql, config, broker);
@@ -83,6 +84,7 @@ async function main() {
registerAgentRoutes(app, sql);
registerSidebarRoutes(app, sql);
registerChatRoutes(app, sql, broker);
registerToolsRoutes(app, sql);
// Batch 9.6: warm the skills cache at boot and surface the count. Empty or
// missing /data/skills is non-fatal — the skill tools just return empty.
@@ -99,7 +101,9 @@ async function main() {
config,
log: app.log,
publish: (sessionId, frame) => {
broker.publish(sessionId, frame as unknown as Record<string, unknown> & { type: string });
// v1.13.11-b: route through the typed publishFrame so the broker's
// Zod gate validates every inference frame before delivery.
broker.publishFrame(sessionId, frame as unknown as import('./types/ws-frames.js').WsFrame);
},
// v1.11: broker handle for compaction.process to publish 'compacted'
// frames on the per-session channel. Inference's regular publish path
@@ -108,7 +112,7 @@ async function main() {
broker,
},
(user, frame) => {
broker.publishUser(user, frame as unknown as Record<string, unknown> & { type: string });
broker.publishUserFrame(user, frame as unknown as import('./types/ws-frames.js').WsFrame);
}
);
registerMessageRoutes(app, sql, {
@@ -127,33 +131,33 @@ async function main() {
},
hasActiveInference: (chatId) => inference.hasActive(chatId),
publishUserMessage: (sessionId, chatId, userMessageId, content) => {
broker.publish(sessionId, {
broker.publishFrame(sessionId, {
type: 'message_started',
message_id: userMessageId,
chat_id: chatId,
role: 'user',
});
broker.publish(sessionId, {
broker.publishFrame(sessionId, {
type: 'delta',
message_id: userMessageId,
chat_id: chatId,
content,
});
broker.publish(sessionId, {
broker.publishFrame(sessionId, {
type: 'message_complete',
message_id: userMessageId,
chat_id: chatId,
});
},
publishMessagesDeleted: (sessionId, chatId, messageIds) => {
broker.publish(sessionId, {
broker.publishFrame(sessionId, {
type: 'messages_deleted',
message_ids: messageIds,
chat_id: chatId,
});
},
publishSessionFrame: (sessionId, frame) => {
broker.publish(sessionId, frame);
broker.publishFrame(sessionId, frame as import('./types/ws-frames.js').WsFrame);
},
});
registerSkillsRoutes(app, sql, {
@@ -161,26 +165,26 @@ async function main() {
inference.enqueue(sessionId, chatId, assistantId, user);
},
publishUserMessage: (sessionId, chatId, userMessageId, content) => {
broker.publish(sessionId, {
broker.publishFrame(sessionId, {
type: 'message_started',
message_id: userMessageId,
chat_id: chatId,
role: 'user',
});
broker.publish(sessionId, {
broker.publishFrame(sessionId, {
type: 'delta',
message_id: userMessageId,
chat_id: chatId,
content,
});
broker.publish(sessionId, {
broker.publishFrame(sessionId, {
type: 'message_complete',
message_id: userMessageId,
chat_id: chatId,
});
},
publishSessionFrame: (sessionId, frame) => {
broker.publish(sessionId, frame);
broker.publishFrame(sessionId, frame as import('./types/ws-frames.js').WsFrame);
},
});
registerWebSocket(app, sql, broker);
@@ -228,7 +232,7 @@ async function main() {
for (const row of rows) {
if (seenChats.has(row.chat_id)) continue;
seenChats.add(row.chat_id);
broker.publishUser('default', {
broker.publishUserFrame('default', {
type: 'chat_status',
chat_id: row.chat_id,
status: 'idle',

View File

@@ -102,7 +102,7 @@ export function registerChatRoutes(
VALUES (${req.params.id}, ${parsed.data.name ?? null}, 'open')
RETURNING id, session_id, name, status, created_at, updated_at
`;
broker.publishUser('default', {
broker.publishUserFrame('default', {
type: 'chat_created',
chat: chat!,
session_id: req.params.id,
@@ -132,7 +132,7 @@ export function registerChatRoutes(
return { error: 'chat not found' };
}
const chat = rows[0]!;
broker.publishUser('default', {
broker.publishUserFrame('default', {
type: 'chat_updated',
chat_id: chat.id,
session_id: chat.session_id,
@@ -162,7 +162,7 @@ export function registerChatRoutes(
`;
const ids = rows.map((r) => r.id);
for (const id of ids) {
broker.publishUser('default', {
broker.publishUserFrame('default', {
type: 'chat_archived',
chat_id: id,
session_id: req.params.id,
@@ -203,7 +203,7 @@ export function registerChatRoutes(
return { error: 'chat not found or already archived' };
}
const row = rows[0]!;
broker.publishUser('default', {
broker.publishUserFrame('default', {
type: 'chat_archived',
chat_id: row.id,
session_id: row.session_id,
@@ -226,7 +226,7 @@ export function registerChatRoutes(
return { error: 'chat not found or not archived' };
}
const chat = rows[0]!;
broker.publishUser('default', { type: 'chat_unarchived', chat });
broker.publishUserFrame('default', { type: 'chat_unarchived', chat });
return chat;
}
);
@@ -243,7 +243,7 @@ export function registerChatRoutes(
return { error: 'chat not found' };
}
const row = result[0]!;
broker.publishUser('default', {
broker.publishUserFrame('default', {
type: 'chat_deleted',
chat_id: row.id,
session_id: row.session_id,
@@ -338,7 +338,7 @@ export function registerChatRoutes(
return chat!;
});
broker.publishUser('default', {
broker.publishUserFrame('default', {
type: 'chat_created',
chat: newChat,
session_id: source.session_id,
@@ -400,13 +400,13 @@ export function registerChatRoutes(
reply.code(409);
return { error: 'message status changed mid-request' };
}
broker.publishUser('default', {
broker.publishUserFrame('default', {
type: 'chat_status',
chat_id: msg.chat_id,
status: 'idle',
at: new Date().toISOString(),
});
broker.publish(msg.session_id, {
broker.publishFrame(msg.session_id, {
type: 'message_complete',
message_id: msg.id,
chat_id: msg.chat_id,

View File

@@ -129,7 +129,7 @@ export function registerProjectRoutes(
RETURNING id, name, path, added_at, last_session_id, status, gitea_remote,
default_system_prompt, default_web_search_enabled
`;
broker.publishUser('default', { type: 'project_created', project: row as unknown as Project });
broker.publishUserFrame('default', { type: 'project_created', project: row as unknown as Project });
reply.code(201);
return {
project: row,
@@ -186,11 +186,11 @@ export function registerProjectRoutes(
`;
if (existing.length === 0) {
broker.publishUser('default', { type: 'project_created', project: row as unknown as Project });
broker.publishUserFrame('default', { type: 'project_created', project: row as unknown as Project });
reply.code(201);
} else {
// existing.status was 'archived' — row has been restored.
broker.publishUser('default', { type: 'project_unarchived', project: row as unknown as Project });
broker.publishUserFrame('default', { type: 'project_unarchived', project: row as unknown as Project });
reply.code(200);
}
return row;
@@ -243,7 +243,7 @@ export function registerProjectRoutes(
// v1.9: the project_updated frame still only carries id + name. Clients
// that need the new fields refetch via api.projects.list() — keeps the
// frame payload lean, per the locked recon decision (d).
broker.publishUser('default', {
broker.publishUserFrame('default', {
type: 'project_updated',
project_id: project.id,
name: project.name,
@@ -260,7 +260,7 @@ export function registerProjectRoutes(
reply.code(404);
return { error: 'not found or already archived' };
}
broker.publishUser('default', { type: 'project_archived', project_id: req.params.id });
broker.publishUserFrame('default', { type: 'project_archived', project_id: req.params.id });
reply.code(204);
return null;
});
@@ -277,7 +277,7 @@ export function registerProjectRoutes(
return { error: 'not found or not archived' };
}
const project = rows[0]!;
broker.publishUser('default', { type: 'project_unarchived', project });
broker.publishUserFrame('default', { type: 'project_unarchived', project });
return project;
});
@@ -288,7 +288,7 @@ export function registerProjectRoutes(
reply.code(404);
return { error: 'not found' };
}
broker.publishUser('default', { type: 'project_deleted', project_id: id });
broker.publishUserFrame('default', { type: 'project_deleted', project_id: id });
reply.code(204);
return null;
});

View File

@@ -112,7 +112,7 @@ export function registerSessionRoutes(
`;
return session!;
});
broker.publishUser('default', {
broker.publishUserFrame('default', {
type: 'session_created',
session: row,
project_id: row.project_id,
@@ -178,7 +178,7 @@ export function registerSessionRoutes(
}
const session = rows[0]!;
if (name !== undefined && session.name !== priorName) {
broker.publishUser('default', {
broker.publishUserFrame('default', {
type: 'session_renamed',
session_id: session.id,
name: session.name,
@@ -188,7 +188,7 @@ export function registerSessionRoutes(
// (notably the SettingsPane open in another tab) can refetch and pick
// up the new fields. Frame stays lean (decision d) — payload is just
// ids + name + updated_at, the client refetches via api.sessions.get.
broker.publishUser('default', {
broker.publishUserFrame('default', {
type: 'session_updated',
session_id: session.id,
project_id: session.project_id,
@@ -220,7 +220,7 @@ export function registerSessionRoutes(
return { error: 'session not found' };
}
const session = rows[0]!;
broker.publishUser('default', {
broker.publishUserFrame('default', {
type: 'session_workspace_updated',
session_id: session.id,
workspace_panes: session.workspace_panes,
@@ -248,7 +248,7 @@ export function registerSessionRoutes(
`;
const ids = rows.map((r) => r.id);
for (const id of ids) {
broker.publishUser('default', {
broker.publishUserFrame('default', {
type: 'session_archived',
session_id: id,
project_id: req.params.id,
@@ -289,7 +289,7 @@ export function registerSessionRoutes(
reply.code(404);
return { error: 'session not found or already archived' };
}
broker.publishUser('default', {
broker.publishUserFrame('default', {
type: 'session_archived',
session_id: rows[0]!.id,
project_id: rows[0]!.project_id,
@@ -312,7 +312,7 @@ export function registerSessionRoutes(
return { error: 'session not found or not archived' };
}
const session = rows[0]!;
broker.publishUser('default', {
broker.publishUserFrame('default', {
type: 'session_created',
session: session,
project_id: session.project_id,
@@ -334,7 +334,7 @@ export function registerSessionRoutes(
return { error: 'not found' };
}
const project_id = deleted[0]!.project_id;
broker.publishUser('default', { type: 'session_deleted', session_id: id, project_id });
broker.publishUserFrame('default', { type: 'session_deleted', session_id: id, project_id });
reply.code(204);
return null;
}

View File

@@ -0,0 +1,40 @@
import type { FastifyInstance } from 'fastify';
import type { Sql } from '../db.js';
export interface ToolCostStat {
tool_name: string;
mean_prompt_tokens: number;
mean_completion_tokens: number;
n_calls: number;
updated_at: string;
}
// v1.13.10: per-tool token cost rolling window read endpoint. Backed by the
// tool_cost_stats view in schema.sql (last 100 calls per tool, equal-split
// attribution across multi-tool turns, sentinel/failed-turn excluded).
// Consumed by AgentPicker for at-a-glance per-agent cost hints.
export function registerToolsRoutes(app: FastifyInstance, sql: Sql): void {
app.get('/api/tools/cost_stats', async () => {
const rows = await sql<
{
tool_name: string;
prompt_tokens_sum: number;
completion_tokens_sum: number;
n_calls: number;
updated_at: string;
}[]
>`
SELECT tool_name, prompt_tokens_sum, completion_tokens_sum, n_calls, updated_at
FROM tool_cost_stats
ORDER BY tool_name ASC
`;
const stats: ToolCostStat[] = rows.map((r) => ({
tool_name: r.tool_name,
mean_prompt_tokens: Math.round(r.prompt_tokens_sum / r.n_calls),
mean_completion_tokens: Math.round(r.completion_tokens_sum / r.n_calls),
n_calls: r.n_calls,
updated_at: r.updated_at,
}));
return { stats };
});
}

View File

@@ -119,6 +119,68 @@ SELECT
WHERE p.message_id = m.id AND p.kind = 'reasoning' AND p.hidden_at IS NULL) AS reasoning_parts
FROM messages m;
-- v1.13.10: per-tool token cost rolling window. Derives from
-- messages_with_parts (the v1.13.1-B view that COALESCEs message_parts over
-- the legacy JSON column) so this works whether the chat predates v1.13.0
-- or postdates v1.13.2 (column drop). No new write site — all source data
-- already lands via the existing tool-phase.ts:94-95 UPDATE.
--
-- Attribution model: equal split. A turn emitting N tool calls divides its
-- prompt/completion tokens by N before attribution. See v1.13.10 dispatch
-- brief for rationale + rejected alternatives.
--
-- Column mapping: messages.ctx_used = prompt (input), messages.tokens_used
-- = completion (output). Non-obvious naming; pinned via canonical writes at
-- tool-phase.ts:94-95 et al.
--
-- Filtering rationale:
-- status='complete' — exclude failed/cancelled (defense in
-- depth; failed-path doesn't write
-- tokens_used so they're filtered
-- indirectly too).
-- metadata->>'kind' exclusions — exclude cap_hit / doom_loop sentinels
-- (defense in depth; sentinels are
-- role='system' with tool_calls=NULL
-- so they're filtered indirectly too).
-- experimental_repairToolCall — no special handling; retries flow
-- as normal next-turn tool_result
-- errors and count naturally.
--
-- Rolling window: last 100 calls per tool_name, ordered by created_at DESC.
-- Aggregate-on-read is microseconds at BooCode scale (single user, ~30
-- tools, < 100 calls each). DROP VIEW + recreate to change window size.
CREATE OR REPLACE VIEW tool_cost_stats AS
WITH per_call AS (
SELECT
(tc->>'name')::text AS tool_name,
(m.ctx_used::float / NULLIF(jsonb_array_length(m.tool_calls), 0)) AS prompt_tokens,
(m.tokens_used::float / NULLIF(jsonb_array_length(m.tool_calls), 0)) AS completion_tokens,
m.created_at,
ROW_NUMBER() OVER (
PARTITION BY (tc->>'name')::text
ORDER BY m.created_at DESC
) AS rn
FROM messages_with_parts m,
LATERAL jsonb_array_elements(m.tool_calls) AS tc
WHERE m.tool_calls IS NOT NULL
AND jsonb_array_length(m.tool_calls) > 0
AND m.tokens_used IS NOT NULL
AND m.ctx_used IS NOT NULL
AND m.status = 'complete'
AND (m.metadata IS NULL
OR m.metadata->>'kind' IS NULL
OR m.metadata->>'kind' NOT IN ('cap_hit', 'doom_loop'))
)
SELECT
tool_name,
ROUND(SUM(prompt_tokens))::int AS prompt_tokens_sum,
ROUND(SUM(completion_tokens))::int AS completion_tokens_sum,
COUNT(*)::int AS n_calls,
MAX(created_at) AS updated_at
FROM per_call
WHERE rn <= 100
GROUP BY tool_name;
ALTER TABLE messages ADD COLUMN IF NOT EXISTS tokens_used INTEGER;
ALTER TABLE messages ADD COLUMN IF NOT EXISTS ctx_used INTEGER;
ALTER TABLE messages ADD COLUMN IF NOT EXISTS ctx_max INTEGER;

View File

@@ -0,0 +1,228 @@
import { describe, it, expect, beforeAll, afterAll } from 'vitest';
import postgres from 'postgres';
import { readFileSync } from 'node:fs';
import { resolve } from 'node:path';
import { fileURLToPath } from 'node:url';
// v1.13.10: integration tests for the tool_cost_stats view. Skipped unless
// DATABASE_URL is set so they don't break `pnpm test` on a fresh checkout.
// Run with:
// DATABASE_URL=postgres://boocode:<pw>@localhost:5500/boocode pnpm -C apps/server test
//
// Isolation: each test uses a unique tool_name suffix derived from a per-test
// counter. The view aggregates globally across all chats, so without unique
// tool names parallel test runs would interfere. Cleanup deletes by tool_name
// suffix in afterAll.
const DB_URL = process.env.DATABASE_URL;
const describeFn = DB_URL ? describe : describe.skip;
const TEST_RUN_ID = `v13_10_${Date.now()}`;
const tname = (suffix: string) => `${TEST_RUN_ID}_${suffix}`;
describeFn('tool_cost_stats view (v1.13.10)', () => {
let sql: ReturnType<typeof postgres>;
let projectId: string;
let sessionId: string;
let chatId: string;
beforeAll(async () => {
if (!DB_URL) return;
sql = postgres(DB_URL, { max: 2, idle_timeout: 5, connect_timeout: 5, onnotice: () => {} });
// Apply the schema before fixtures so the view exists. Idempotent via
// CREATE OR REPLACE VIEW + CREATE TABLE IF NOT EXISTS; safe to run on a
// pre-populated DB. Mirrors apps/server/src/db.ts:applySchema.
const here = fileURLToPath(import.meta.url);
const schemaPath = resolve(here, '../../../schema.sql');
const ddl = readFileSync(schemaPath, 'utf8');
await sql.unsafe(ddl);
// Fixture project + session + chat for all inserts in this file.
const proj = await sql<{ id: string }[]>`
INSERT INTO projects (name, path)
VALUES (${`tool_cost_stats_test_${TEST_RUN_ID}`}, ${`/tmp/${TEST_RUN_ID}`})
RETURNING id
`;
projectId = proj[0]!.id;
const sess = await sql<{ id: string }[]>`
INSERT INTO sessions (project_id, name, model)
VALUES (${projectId}, ${'test'}, ${'test-model'})
RETURNING id
`;
sessionId = sess[0]!.id;
const chat = await sql<{ id: string }[]>`
INSERT INTO chats (session_id, name) VALUES (${sessionId}, ${'test'}) RETURNING id
`;
chatId = chat[0]!.id;
});
afterAll(async () => {
if (!DB_URL) return;
// Project FK CASCADE cleans sessions/chats/messages/parts in one shot.
await sql`DELETE FROM projects WHERE id = ${projectId}`;
await sql.end({ timeout: 5 });
});
async function insertAssistantTurn(opts: {
toolNames: string[];
tokensUsed: number | null;
ctxUsed: number | null;
status?: 'streaming' | 'complete' | 'failed' | 'cancelled';
metadata?: { kind: string } | null;
createdAt?: Date;
}): Promise<string> {
const toolCalls = opts.toolNames.map((name, i) => ({
id: `call_${TEST_RUN_ID}_${name}_${i}`,
name,
args: {},
}));
const created = opts.createdAt ?? new Date();
const rows = await sql<{ id: string }[]>`
INSERT INTO messages (
session_id, chat_id, role, content, kind, status,
tool_calls, tokens_used, ctx_used,
metadata, created_at
)
VALUES (
${sessionId}, ${chatId}, 'assistant', '', 'message',
${opts.status ?? 'complete'},
${sql.json(toolCalls as never)},
${opts.tokensUsed},
${opts.ctxUsed},
${opts.metadata ? sql.json(opts.metadata as never) : null},
${created}
)
RETURNING id
`;
return rows[0]!.id;
}
it('returns empty when no tool calls exist for a tool name', async () => {
const t = tname('absent');
const stats = await sql<{ tool_name: string }[]>`
SELECT * FROM tool_cost_stats WHERE tool_name = ${t}
`;
expect(stats).toEqual([]);
});
it('attributes single-tool turn fully to that tool', async () => {
const t = tname('single');
await insertAssistantTurn({ toolNames: [t], tokensUsed: 300, ctxUsed: 15000 });
const stats = await sql<{
tool_name: string;
prompt_tokens_sum: number;
completion_tokens_sum: number;
n_calls: number;
}[]>`SELECT * FROM tool_cost_stats WHERE tool_name = ${t}`;
expect(stats[0]).toMatchObject({
tool_name: t,
prompt_tokens_sum: 15000,
completion_tokens_sum: 300,
n_calls: 1,
});
});
it('splits multi-tool turn equally across tools', async () => {
const a = tname('multi_a');
const b = tname('multi_b');
const c = tname('multi_c');
// 3 tools, 300 completion / 15000 prompt → each gets 100 / 5000
await insertAssistantTurn({ toolNames: [a, b, c], tokensUsed: 300, ctxUsed: 15000 });
const stats = await sql<{
tool_name: string;
prompt_tokens_sum: number;
completion_tokens_sum: number;
n_calls: number;
}[]>`
SELECT * FROM tool_cost_stats
WHERE tool_name IN (${a}, ${b}, ${c})
ORDER BY tool_name
`;
expect(stats).toHaveLength(3);
for (const s of stats) {
expect(s.completion_tokens_sum).toBe(100);
expect(s.prompt_tokens_sum).toBe(5000);
expect(s.n_calls).toBe(1);
}
});
it('limits to last 100 calls per tool (FIFO window)', async () => {
const t = tname('window');
// Insert 110 turns with monotonically-increasing created_at and tokensUsed.
// Expect view to keep only the most recent 100.
const base = Date.now() + 1_000_000; // distant future to avoid colliding with other tests
for (let i = 1; i <= 110; i++) {
await insertAssistantTurn({
toolNames: [t],
tokensUsed: i, // 1..110
ctxUsed: i * 10,
createdAt: new Date(base + i),
});
}
const [stat] = await sql<{
n_calls: number;
completion_tokens_sum: number;
}[]>`SELECT n_calls, completion_tokens_sum FROM tool_cost_stats WHERE tool_name = ${t}`;
expect(stat!.n_calls).toBe(100);
// Last 100 are tokensUsed=11..110, sum = (11+110)*100/2 = 6050.
expect(stat!.completion_tokens_sum).toBe(6050);
});
it('excludes turns with NULL tokens_used (pre-v1.13.7 latent regression)', async () => {
const t = tname('null_tokens');
await insertAssistantTurn({ toolNames: [t], tokensUsed: null, ctxUsed: 1000 });
await insertAssistantTurn({ toolNames: [t], tokensUsed: 100, ctxUsed: null });
const stats = await sql`SELECT * FROM tool_cost_stats WHERE tool_name = ${t}`;
expect(stats).toEqual([]);
});
it('excludes failed/cancelled turns and cap_hit/doom_loop sentinel rows', async () => {
const t = tname('filtered');
// A: status='failed' — excluded
// B: status='cancelled' — excluded
// C: status='complete', metadata={kind:'cap_hit'} — excluded
// D: status='complete', metadata={kind:'doom_loop'} — excluded
// E: status='complete', metadata=null — included
await insertAssistantTurn({ toolNames: [t], tokensUsed: 100, ctxUsed: 1000, status: 'failed' });
await insertAssistantTurn({ toolNames: [t], tokensUsed: 100, ctxUsed: 1000, status: 'cancelled' });
await insertAssistantTurn({ toolNames: [t], tokensUsed: 100, ctxUsed: 1000, metadata: { kind: 'cap_hit' } });
await insertAssistantTurn({ toolNames: [t], tokensUsed: 100, ctxUsed: 1000, metadata: { kind: 'doom_loop' } });
await insertAssistantTurn({ toolNames: [t], tokensUsed: 100, ctxUsed: 1000, metadata: null });
const [stat] = await sql<{ n_calls: number }[]>`
SELECT n_calls FROM tool_cost_stats WHERE tool_name = ${t}
`;
expect(stat!.n_calls).toBe(1);
});
it('reads tool_calls via messages_with_parts (parts-authoritative)', async () => {
const t = tname('parts');
// Insert an assistant row with messages.tool_calls=NULL but a
// message_parts row carrying the tool_call. The view reads via
// messages_with_parts, which COALESCEs the parts table over the legacy
// column — so this row should still aggregate.
const rows = await sql<{ id: string }[]>`
INSERT INTO messages (
session_id, chat_id, role, content, kind, status,
tool_calls, tokens_used, ctx_used
)
VALUES (
${sessionId}, ${chatId}, 'assistant', '', 'message', 'complete',
NULL, 200, 5000
)
RETURNING id
`;
const messageId = rows[0]!.id;
await sql`
INSERT INTO message_parts (message_id, sequence, kind, payload)
VALUES (
${messageId}, 0, 'tool_call',
${sql.json({ id: `tc_parts_${TEST_RUN_ID}`, name: t, args: {} } as never)}
)
`;
const [stat] = await sql<{ n_calls: number }[]>`
SELECT n_calls FROM tool_cost_stats WHERE tool_name = ${t}
`;
expect(stat!.n_calls).toBe(1);
});
});

View File

@@ -1,5 +1,11 @@
import { describe, it, expect } from 'vitest';
import { ALL_TOOLS } from '../tools.js';
import {
ALL_TOOLS,
CORE_TOOL_NAMES,
STANDARD_TOOL_NAMES,
TOOLS_BY_NAME,
resolveToolTier,
} from '../tools.js';
describe('ALL_TOOLS registry', () => {
// v1.13.3: tools must be alpha-sorted at module load. llama.cpp's prompt
@@ -12,3 +18,59 @@ describe('ALL_TOOLS registry', () => {
expect(names).toEqual([...names].sort((a, b) => a.localeCompare(b)));
});
});
describe('resolveToolTier (v1.13.15-tools)', () => {
it('returns CORE tools for tier=core', () => {
expect(resolveToolTier('core')).toEqual(CORE_TOOL_NAMES);
});
it('returns STANDARD tools for tier=standard', () => {
const result = resolveToolTier('standard');
expect(result.length).toBe(STANDARD_TOOL_NAMES.length);
expect(result.length).toBeGreaterThan(CORE_TOOL_NAMES.length);
// STANDARD is a strict superset of CORE.
expect(result).toEqual(expect.arrayContaining([...CORE_TOOL_NAMES]));
});
it('returns ALL tool names for tier=all', () => {
expect(resolveToolTier('all').length).toBe(ALL_TOOLS.length);
});
it('defaults to all when env var is undefined', () => {
expect(resolveToolTier(undefined).length).toBe(ALL_TOOLS.length);
});
it('is case-insensitive', () => {
expect(resolveToolTier('CORE')).toEqual(CORE_TOOL_NAMES);
expect(resolveToolTier('Standard').length).toBe(STANDARD_TOOL_NAMES.length);
});
it('falls back to all for unknown tier strings', () => {
expect(resolveToolTier('bogus').length).toBe(ALL_TOOLS.length);
});
});
describe('CORE_TOOL_NAMES + STANDARD_TOOL_NAMES validation', () => {
// The module-load validation in tools.ts throws if a tier references a
// tool that doesn't exist in TOOLS_BY_NAME. These tests double-check that
// invariant from the consumer side so a future tier-list edit can't smuggle
// in a typo without a test failure.
it('every CORE name exists in TOOLS_BY_NAME', () => {
for (const name of CORE_TOOL_NAMES) {
expect(TOOLS_BY_NAME[name], `CORE references unknown tool '${name}'`).toBeDefined();
}
});
it('every STANDARD name exists in TOOLS_BY_NAME', () => {
for (const name of STANDARD_TOOL_NAMES) {
expect(TOOLS_BY_NAME[name], `STANDARD references unknown tool '${name}'`).toBeDefined();
}
});
it('CORE is a subset of STANDARD', () => {
const standardSet = new Set<string>(STANDARD_TOOL_NAMES);
for (const name of CORE_TOOL_NAMES) {
expect(standardSet.has(name), `'${name}' is in CORE but not STANDARD`).toBe(true);
}
});
});

View File

@@ -0,0 +1,218 @@
import { describe, it, expect, vi, beforeEach, afterEach } from 'vitest';
import { readFileSync } from 'node:fs';
import { resolve } from 'node:path';
import { fileURLToPath } from 'node:url';
import {
WsFrameSchema,
KNOWN_FRAME_TYPES,
type WsFrame,
} from '../../types/ws-frames.js';
import { createBroker } from '../broker.js';
const VALID_UUID_A = '00000000-0000-0000-0000-000000000001';
const VALID_UUID_B = '00000000-0000-0000-0000-000000000002';
const VALID_UUID_C = '00000000-0000-0000-0000-000000000003';
const VALID_TIMESTAMP = '2026-05-22T14:30:00.000Z';
describe('WsFrameSchema (v1.13.11-a)', () => {
it('accepts a well-formed chat_status frame', () => {
const result = WsFrameSchema.safeParse({
type: 'chat_status',
chat_id: VALID_UUID_A,
status: 'streaming',
at: VALID_TIMESTAMP,
});
expect(result.success).toBe(true);
});
it('rejects an unknown frame type', () => {
const result = WsFrameSchema.safeParse({
type: 'cosmic_ray_strike',
chat_id: VALID_UUID_A,
});
expect(result.success).toBe(false);
});
it('rejects a chat_status frame with invalid status enum', () => {
// v1.12.1 dropped the legacy 'working' status. Any frame still emitting it
// should fail validation — that's a drift catcher.
const result = WsFrameSchema.safeParse({
type: 'chat_status',
chat_id: VALID_UUID_A,
status: 'working',
at: VALID_TIMESTAMP,
});
expect(result.success).toBe(false);
});
it('rejects a UUID field with a non-UUID string', () => {
const result = WsFrameSchema.safeParse({
type: 'chat_status',
chat_id: 'not-a-uuid',
status: 'idle',
at: VALID_TIMESTAMP,
});
expect(result.success).toBe(false);
});
it('rejects negative token counts in usage frame', () => {
const result = WsFrameSchema.safeParse({
type: 'usage',
message_id: VALID_UUID_A,
chat_id: VALID_UUID_B,
completion_tokens: -1,
ctx_used: 100,
ctx_max: 1000,
});
expect(result.success).toBe(false);
});
it('accepts a usage frame with nullable token counts (pre-v1.13.7 history)', () => {
const result = WsFrameSchema.safeParse({
type: 'usage',
message_id: VALID_UUID_A,
chat_id: VALID_UUID_B,
completion_tokens: null,
ctx_used: null,
ctx_max: null,
});
expect(result.success).toBe(true);
});
it('accepts a tool_result frame with non-UUID tool_call_id (model-emitted)', () => {
// Model-emitted tool_call_ids look like "call_abc123", not UUIDs.
const result = WsFrameSchema.safeParse({
type: 'tool_result',
tool_message_id: VALID_UUID_A,
chat_id: VALID_UUID_B,
tool_call_id: 'call_abc123',
output: { whatever: true },
truncated: false,
});
expect(result.success).toBe(true);
});
it('accepts a compacted frame', () => {
const result = WsFrameSchema.safeParse({
type: 'compacted',
session_id: VALID_UUID_A,
chat_id: VALID_UUID_B,
summary_message_id: VALID_UUID_C,
});
expect(result.success).toBe(true);
});
it('accepts a session_workspace_updated frame', () => {
const result = WsFrameSchema.safeParse({
type: 'session_workspace_updated',
session_id: VALID_UUID_A,
workspace_panes: [{ id: 'p1', kind: 'chat', chatIds: [], activeChatIdx: 0 }],
});
expect(result.success).toBe(true);
});
it('every KNOWN_FRAME_TYPES entry has a discriminated branch', () => {
// Probe each known type by attempting a minimal valid construction.
// Failure here means the union and the KNOWN_FRAME_TYPES list drifted.
for (const type of KNOWN_FRAME_TYPES) {
const probe = WsFrameSchema.safeParse({ type, __dummy__: true });
// We expect FAILURE on every type because we're missing required fields,
// but the failure must be ABOUT the missing fields, not about an unknown
// type. A "Invalid discriminator value" error means the type isn't in
// the union — that's a drift.
if (probe.success) continue;
const issues = probe.error.issues;
const hasInvalidDiscriminator = issues.some(
(i) => i.code === 'invalid_union_discriminator',
);
expect(hasInvalidDiscriminator, `frame type '${type}' is missing from the discriminated union`).toBe(false);
}
});
});
describe('ws-frames.ts file mirror parity', () => {
it('apps/server and apps/web copies are byte-identical', () => {
const here = fileURLToPath(import.meta.url);
const serverPath = resolve(here, '../../../types/ws-frames.ts');
const webPath = resolve(here, '../../../../../web/src/api/ws-frames.ts');
const serverContent = readFileSync(serverPath, 'utf8');
const webContent = readFileSync(webPath, 'utf8');
expect(webContent, 'apps/web/src/api/ws-frames.ts must be byte-identical to apps/server/src/types/ws-frames.ts').toBe(serverContent);
});
});
describe('broker.publishFrame / publishUserFrame fail-closed behavior', () => {
let logErrors: Array<{ obj: unknown; msg: string }>;
let mockLog: Parameters<typeof createBroker>[0];
beforeEach(() => {
logErrors = [];
mockLog = {
error: (obj: unknown, msg: string) => {
logErrors.push({ obj, msg });
},
info: () => {},
warn: () => {},
debug: () => {},
trace: () => {},
fatal: () => {},
child: () => mockLog as never,
level: 'info',
silent: () => {},
} as unknown as Parameters<typeof createBroker>[0];
});
afterEach(() => {
vi.restoreAllMocks();
});
it('publishFrame delivers a valid frame to subscribers', () => {
const broker = createBroker(mockLog);
const received: WsFrame[] = [];
broker.subscribe('sess-1', (f) => received.push(f as WsFrame));
broker.publishFrame('sess-1', {
type: 'delta',
message_id: VALID_UUID_A,
chat_id: VALID_UUID_B,
content: 'hello',
});
expect(received).toHaveLength(1);
expect((received[0] as { type: string }).type).toBe('delta');
expect(logErrors).toHaveLength(0);
});
it('publishFrame drops + logs an invalid frame instead of delivering it', () => {
const broker = createBroker(mockLog);
const received: WsFrame[] = [];
broker.subscribe('sess-1', (f) => received.push(f as WsFrame));
broker.publishFrame('sess-1', {
type: 'delta',
message_id: 'not-a-uuid',
content: 'hello',
} as never);
expect(received).toHaveLength(0);
expect(logErrors).toHaveLength(1);
expect(logErrors[0]!.msg).toMatch(/ws-frame-validation-failed/);
});
it('publishUserFrame drops + logs an invalid user-channel frame', () => {
const broker = createBroker(mockLog);
const received: WsFrame[] = [];
broker.subscribeUser('default', (f) => received.push(f as WsFrame));
broker.publishUserFrame('default', {
type: 'chat_status',
chat_id: VALID_UUID_A,
status: 'working', // v1.12.1 dropped this enum value
at: VALID_TIMESTAMP,
} as never);
expect(received).toHaveLength(0);
expect(logErrors).toHaveLength(1);
});
it('publishFrame validation failure does not throw (no cascade into stream-phase)', () => {
const broker = createBroker(mockLog);
expect(() =>
broker.publishFrame('sess-1', { type: 'unknown_type' } as never),
).not.toThrow();
});
});

View File

@@ -1,7 +1,7 @@
import { promises as fs } from 'node:fs';
import { join } from 'node:path';
import type { Agent, AgentsResponse, AgentParseError } from '../types/api.js';
import { ALL_TOOLS } from './tools.js';
import { ALL_TOOLS, resolveToolTier } from './tools.js';
// v1.8.1: global agents live at /data/AGENTS.md inside the container
// (./data:/data:ro mount on the host). Per-project AGENTS.md at the project
@@ -186,11 +186,14 @@ function parseAgentSection(section: RawSection): Omit<Agent, 'source'> {
throw new Error(fmErrors.join('; '));
}
// v1.13.15-tools: intersect with BOOCODE_TOOLS tier (ceiling, not expansion).
// Unset → resolveToolTier returns ALL tool names → no narrowing.
const tierAllowed = new Set(resolveToolTier(process.env.BOOCODE_TOOLS));
const filteredTools = Array.isArray(fm.tools)
? fm.tools.filter((t): t is string =>
(ALL_TOOL_NAMES as readonly string[]).includes(t),
(ALL_TOOL_NAMES as readonly string[]).includes(t) && tierAllowed.has(t),
)
: DEFAULT_TOOLS;
: DEFAULT_TOOLS.filter((t) => tierAllowed.has(t));
return {
id: slugify(section.name),

View File

@@ -1,3 +1,6 @@
import type { FastifyBaseLogger } from 'fastify';
import { WsFrameSchema, type WsFrame } from '../types/ws-frames.js';
export type Frame = Record<string, unknown> & { type: string };
export type Listener = (frame: Frame) => void;
@@ -6,9 +9,15 @@ export interface Broker {
subscribe(sessionId: string, listener: Listener): () => void;
publishUser(user: string, frame: Frame): void;
subscribeUser(user: string, listener: Listener): () => void;
// v1.13.11-a: typed publish wrappers. Validate against WsFrameSchema and
// delegate to publish / publishUser on success; log + drop on failure
// (fail-closed). Existing publish / publishUser callers stay legal — they
// get converted to the typed variant in v1.13.11-b.
publishFrame(sessionId: string, frame: WsFrame): void;
publishUserFrame(user: string, frame: WsFrame): void;
}
export function createBroker(): Broker {
export function createBroker(log?: FastifyBaseLogger): Broker {
const topics = new Map<string, Set<Listener>>();
const userTopics = new Map<string, Set<Listener>>();
@@ -39,6 +48,28 @@ export function createBroker(): Broker {
};
}
// v1.13.11-a: shared validation guard. Returns the parsed/typed frame on
// success, or null on failure (after logging). Brief mandates fail-closed
// semantics: invalid frames don't reach subscribers; throwing here could
// cascade into stream-phase aborts which v1.13.7 already had to defend
// against, so log + drop is the right shape.
function validate(channel: 'session' | 'user', key: string, frame: WsFrame): WsFrame | null {
const parsed = WsFrameSchema.safeParse(frame);
if (parsed.success) return parsed.data;
const frameType = (frame as { type?: unknown })?.type;
const errors = parsed.error.flatten();
if (log) {
log.error(
{ channel, key, frame_type: frameType, errors },
'ws-frame-validation-failed: dropping invalid frame',
);
} else {
// Fallback for callers that didn't pass a logger (e.g. unit tests).
console.error('ws-frame-validation-failed', { channel, key, frame_type: frameType, errors });
}
return null;
}
return {
publish(sessionId, frame) {
publishTo(topics, sessionId, frame);
@@ -52,5 +83,15 @@ export function createBroker(): Broker {
subscribeUser(user, listener) {
return subscribeTo(userTopics, user, listener);
},
publishFrame(sessionId, frame) {
const valid = validate('session', sessionId, frame);
if (!valid) return;
publishTo(topics, sessionId, valid as Frame);
},
publishUserFrame(user, frame) {
const valid = validate('user', user, frame);
if (!valid) return;
publishTo(userTopics, user, valid as Frame);
},
};
}

View File

@@ -431,15 +431,16 @@ export async function process(input: ProcessInput): Promise<void> {
'compaction: invoking model',
);
// 6a. Flip the chat dot amber for the duration of the LLM call + DB writes.
// Same { type: 'chat_status', status: 'working', at } shape inference.ts
// emits at runner enqueue. publishUser → broadcasts on the per-user channel
// (all devices / tabs see it) since chat_status is a user-channel frame in
// BooCode (see useChatStatus.ts, which is the consumer).
broker.publishUser('default', {
// 6a. Flip the chat dot for the duration of the LLM call + DB writes.
// v1.13.11-b: publish status='streaming' (the v1.12.1-widened replacement
// for the dropped 'working' value). Compaction's LLM call has the same
// semantic as an inference turn for dot-state purposes. The v1.12.1
// chat_status widening missed this site; v1.13.11's WsFrame Zod schema
// surfaced the drift via the unknown-enum-value check.
broker.publishUserFrame('default', {
type: 'chat_status',
chat_id: chatId,
status: 'working',
status: 'streaming',
at: new Date().toISOString(),
});
@@ -508,7 +509,7 @@ export async function process(input: ProcessInput): Promise<void> {
// Always restore the dot. Status='idle' (not 'error') even on failure —
// the caller logs/re-surfaces the error separately; the dot doesn't
// need to stay red across reloads for a transient compaction blip.
broker.publishUser('default', {
broker.publishUserFrame('default', {
type: 'chat_status',
chat_id: chatId,
status: 'idle',
@@ -522,7 +523,7 @@ export async function process(input: ProcessInput): Promise<void> {
// toast. Order matters: idle must precede 'compacted' so the dot is
// already green by the time the refetch toast appears.
if (succeeded) {
broker.publish(sessionId, {
broker.publishFrame(sessionId, {
type: 'compacted',
session_id: sessionId,
chat_id: chatId,

View File

@@ -700,6 +700,64 @@ export const TOOLS_BY_NAME: Record<string, ToolDef<unknown>> = Object.fromEntrie
ALL_TOOLS.map((t) => [t.name, t])
);
// v1.13.15-tools: tiered tool loading. BOOCODE_TOOLS env var (`core` |
// `standard` | `all`) filters the agent's tool whitelist before LLM dispatch.
// Daily-driver token win on qwen3.6-35b-a3b — the 35B-A3B MoE benefits from
// any prompt-cache stability win (fewer tools = shorter, more stable tool
// schemas in the system prompt). Pattern lift from eyaltoledano/claude-task-
// master (MIT + Commons Clause — pattern only, no code lift).
//
// The env var is a CEILING. It only narrows; never expands an agent's
// declared whitelist. Default behavior (var unset) is unchanged: all tools.
export const CORE_TOOL_NAMES = [
'view_file',
'list_dir',
'grep',
'find_files',
] as const;
export const STANDARD_TOOL_NAMES = [
...CORE_TOOL_NAMES,
'web_search',
'web_fetch',
'git_status',
'get_codebase_overview',
'get_file_analysis',
'get_symbol_info',
'search_symbols',
'get_dependencies',
'watch_changes',
'get_semantic_neighborhoods',
'get_framework_analysis',
] as const;
// Module-load validation: every name in CORE / STANDARD must exist in
// TOOLS_BY_NAME. Catches typos and stale tier definitions before they reach
// production; server boot fails loudly rather than silently filtering valid
// tools out of agent whitelists.
for (const name of CORE_TOOL_NAMES) {
if (!TOOLS_BY_NAME[name]) {
throw new Error(`CORE_TOOL_NAMES references unknown tool: '${name}'`);
}
}
for (const name of STANDARD_TOOL_NAMES) {
if (!TOOLS_BY_NAME[name]) {
throw new Error(`STANDARD_TOOL_NAMES references unknown tool: '${name}'`);
}
}
export function resolveToolTier(tier: string | undefined): readonly string[] {
switch ((tier ?? 'all').toLowerCase()) {
case 'core':
return CORE_TOOL_NAMES;
case 'standard':
return STANDARD_TOOL_NAMES;
case 'all':
default:
return ALL_TOOLS.map((t) => t.name);
}
}
export function toolJsonSchemas(): ToolJsonSchema[] {
return ALL_TOOLS.map((t) => t.jsonSchema);
}

View File

@@ -0,0 +1,318 @@
// v1.13.11-a: Zod schemas for every WebSocket frame published by the server.
// Validation runs both on send (broker.publishFrame / publishUserFrame) and
// on receive (apps/web/src/hooks/useSessionStream + useUserEvents). Catches
// silent protocol drift between publisher and consumer.
//
// IMPORTANT: This file is duplicated byte-identical at
// apps/web/src/api/ws-frames.ts. The two apps have separate tsconfigs and
// no path alias; the duplication is sync-by-hand. A test asserts the two
// files match. If you change one, change the other.
//
// Per-kind payload schemas (tool_call args, message_parts payloads, etc.)
// stay z.unknown() in v1.13.11. Frame-level drift detection is the goal;
// deep payload validation is follow-up work.
import { z } from 'zod';
// ---- shared primitives -----------------------------------------------------
const Uuid = z.string().uuid();
// Tool call IDs are model-emitted (e.g. "call_abc123") — not UUIDs.
const ToolCallId = z.string().min(1);
const IsoTimestamp = z.string().min(1);
const ChatStatusValue = z.enum([
'streaming',
'tool_running',
'waiting_for_input',
'idle',
'error',
]);
const ErrorReasonValue = z.enum([
'llm_provider_error',
'doom_loop',
'doom_loop_summary_failed',
'cap_hit',
'cap_hit_summary_failed',
]);
const MessageRoleValue = z.enum(['user', 'assistant', 'system', 'tool']);
const ToolCallShape = z.object({
id: ToolCallId,
name: z.string().min(1),
args: z.record(z.string(), z.unknown()),
});
// Free-form bags: opaque to the frame schema; deep validation is out of
// scope for v1.13.11 (frame-level drift detection is the goal; per-kind
// payload narrowing is follow-up work). z.unknown() means the consumer
// must narrow before reading — TypeScript-side this is fine because every
// consumer already operates on the hand-maintained Project / Chat / Session
// / WorkspacePane types (the brief's "Don't strip existing types yet"
// rule), and the Zod-typed shape is only used at the publishFrame boundary.
const OpaqueObject = z.unknown();
// ---- per-session channel frames --------------------------------------------
export const SnapshotFrame = z.object({
type: z.literal('snapshot'),
messages: z.array(OpaqueObject),
});
export const MessageStartedFrame = z.object({
type: z.literal('message_started'),
message_id: Uuid,
chat_id: Uuid.optional(),
role: MessageRoleValue,
});
export const DeltaFrame = z.object({
type: z.literal('delta'),
message_id: Uuid,
chat_id: Uuid.optional(),
content: z.string(),
});
export const ToolCallFrame = z.object({
type: z.literal('tool_call'),
message_id: Uuid,
chat_id: Uuid.optional(),
tool_call: ToolCallShape,
});
export const ToolResultFrame = z.object({
type: z.literal('tool_result'),
tool_message_id: Uuid,
chat_id: Uuid.optional(),
tool_call_id: ToolCallId,
output: z.unknown(),
truncated: z.boolean(),
error: z.string().optional(),
});
export const MessageCompleteFrame = z.object({
type: z.literal('message_complete'),
message_id: Uuid,
chat_id: Uuid.optional(),
tokens_used: z.number().int().nonnegative().nullable().optional(),
ctx_used: z.number().int().nonnegative().nullable().optional(),
ctx_max: z.number().int().positive().nullable().optional(),
started_at: IsoTimestamp.nullable().optional(),
finished_at: IsoTimestamp.nullable().optional(),
model: z.string().optional(),
metadata: OpaqueObject.nullable().optional(),
});
export const UsageFrame = z.object({
type: z.literal('usage'),
message_id: Uuid,
chat_id: Uuid.optional(),
completion_tokens: z.number().int().nonnegative().nullable(),
ctx_used: z.number().int().nonnegative().nullable(),
ctx_max: z.number().int().positive().nullable(),
});
export const MessagesDeletedFrame = z.object({
type: z.literal('messages_deleted'),
message_ids: z.array(Uuid),
chat_id: Uuid.optional(),
});
export const ChatRenamedFrame = z.object({
type: z.literal('chat_renamed'),
chat_id: Uuid,
name: z.string(),
});
export const CompactedFrame = z.object({
type: z.literal('compacted'),
session_id: Uuid,
chat_id: Uuid,
summary_message_id: Uuid,
});
export const ErrorFrame = z.object({
type: z.literal('error'),
message_id: Uuid.optional(),
chat_id: Uuid.optional(),
error: z.string(),
reason: ErrorReasonValue.optional(),
});
// ---- per-user channel frames (sidebar refresh) -----------------------------
export const ChatStatusFrame = z.object({
type: z.literal('chat_status'),
chat_id: Uuid,
status: ChatStatusValue,
at: IsoTimestamp,
reason: ErrorReasonValue.optional(),
});
export const SessionUpdatedFrame = z.object({
type: z.literal('session_updated'),
session_id: Uuid,
project_id: Uuid,
name: z.string(),
updated_at: IsoTimestamp,
});
export const SessionRenamedFrame = z.object({
type: z.literal('session_renamed'),
session_id: Uuid,
name: z.string(),
});
export const SessionCreatedFrame = z.object({
type: z.literal('session_created'),
session: OpaqueObject,
project_id: Uuid,
});
export const SessionArchivedFrame = z.object({
type: z.literal('session_archived'),
session_id: Uuid,
project_id: Uuid,
});
export const SessionDeletedFrame = z.object({
type: z.literal('session_deleted'),
session_id: Uuid,
project_id: Uuid,
});
export const SessionWorkspaceUpdatedFrame = z.object({
type: z.literal('session_workspace_updated'),
session_id: Uuid,
workspace_panes: z.array(OpaqueObject),
});
export const ChatCreatedFrame = z.object({
type: z.literal('chat_created'),
chat: OpaqueObject,
session_id: Uuid,
});
export const ChatUpdatedFrame = z.object({
type: z.literal('chat_updated'),
chat_id: Uuid,
session_id: Uuid,
name: z.string().nullable(),
updated_at: IsoTimestamp,
});
export const ChatArchivedFrame = z.object({
type: z.literal('chat_archived'),
chat_id: Uuid,
session_id: Uuid,
});
export const ChatUnarchivedFrame = z.object({
type: z.literal('chat_unarchived'),
chat: OpaqueObject,
});
export const ChatDeletedFrame = z.object({
type: z.literal('chat_deleted'),
chat_id: Uuid,
session_id: Uuid,
});
export const ProjectCreatedFrame = z.object({
type: z.literal('project_created'),
project: OpaqueObject,
});
export const ProjectArchivedFrame = z.object({
type: z.literal('project_archived'),
project_id: Uuid,
});
export const ProjectUnarchivedFrame = z.object({
type: z.literal('project_unarchived'),
project: OpaqueObject,
});
export const ProjectUpdatedFrame = z.object({
type: z.literal('project_updated'),
project_id: Uuid,
name: z.string(),
});
export const ProjectDeletedFrame = z.object({
type: z.literal('project_deleted'),
project_id: Uuid,
});
// ---- discriminated union ---------------------------------------------------
export const WsFrameSchema = z.discriminatedUnion('type', [
// per-session
SnapshotFrame,
MessageStartedFrame,
DeltaFrame,
ToolCallFrame,
ToolResultFrame,
MessageCompleteFrame,
UsageFrame,
MessagesDeletedFrame,
ChatRenamedFrame,
CompactedFrame,
ErrorFrame,
// per-user
ChatStatusFrame,
SessionUpdatedFrame,
SessionRenamedFrame,
SessionCreatedFrame,
SessionArchivedFrame,
SessionDeletedFrame,
SessionWorkspaceUpdatedFrame,
ChatCreatedFrame,
ChatUpdatedFrame,
ChatArchivedFrame,
ChatUnarchivedFrame,
ChatDeletedFrame,
ProjectCreatedFrame,
ProjectArchivedFrame,
ProjectUnarchivedFrame,
ProjectUpdatedFrame,
ProjectDeletedFrame,
]);
export type WsFrame = z.infer<typeof WsFrameSchema>;
// Convenience: the set of known frame types. Useful for the publishFrame
// helper to log the offending type name when validation fails. Kept in sync
// by hand with the discriminated union above.
export const KNOWN_FRAME_TYPES: readonly WsFrame['type'][] = [
'snapshot',
'message_started',
'delta',
'tool_call',
'tool_result',
'message_complete',
'usage',
'messages_deleted',
'chat_renamed',
'compacted',
'error',
'chat_status',
'session_updated',
'session_renamed',
'session_created',
'session_archived',
'session_deleted',
'session_workspace_updated',
'chat_created',
'chat_updated',
'chat_archived',
'chat_unarchived',
'chat_deleted',
'project_created',
'project_archived',
'project_unarchived',
'project_updated',
'project_deleted',
] as const;

View File

@@ -31,7 +31,8 @@
"shiki": "^1.29.2",
"sonner": "^2.0.7",
"tailwind-merge": "^3.6.0",
"tw-animate-css": "^1.4.0"
"tw-animate-css": "^1.4.0",
"zod": "^3.23.8"
},
"devDependencies": {
"@tailwindcss/postcss": "^4.3.0",

View File

@@ -12,6 +12,7 @@ import type {
GitMeta,
Skill,
AskUserAnswer,
ToolCostStat,
} from './types';
export class ApiError extends Error {
@@ -262,6 +263,14 @@ export const api = {
list: () => request<{ skills: Skill[] }>('/api/skills'),
},
// v1.13.10: per-tool cost rolling-window stats (last 100 calls per tool,
// equal-split attribution across multi-tool turns). Read endpoint backed by
// the tool_cost_stats view. AgentPicker consumes this for per-agent cost
// hints.
tools: {
costStats: () => request<{ stats: ToolCostStat[] }>('/api/tools/cost_stats'),
},
settings: {
get: () => request<Record<string, unknown>>('/api/settings'),
patch: (body: Record<string, unknown>) =>

View File

@@ -1,6 +1,18 @@
export const PROJECT_STATUSES = ['open', 'archived'] as const;
export type ProjectStatus = typeof PROJECT_STATUSES[number];
// v1.13.10: per-tool cost rolling-window stat. Returned by
// GET /api/tools/cost_stats — one entry per tool with mean prompt/completion
// tokens over the last 100 invocations. AgentPicker sums across an agent's
// whitelisted tools for per-agent cost hints.
export interface ToolCostStat {
tool_name: string;
mean_prompt_tokens: number;
mean_completion_tokens: number;
n_calls: number;
updated_at: string;
}
export interface Project {
id: string;
name: string;

View File

@@ -0,0 +1,318 @@
// v1.13.11-a: Zod schemas for every WebSocket frame published by the server.
// Validation runs both on send (broker.publishFrame / publishUserFrame) and
// on receive (apps/web/src/hooks/useSessionStream + useUserEvents). Catches
// silent protocol drift between publisher and consumer.
//
// IMPORTANT: This file is duplicated byte-identical at
// apps/web/src/api/ws-frames.ts. The two apps have separate tsconfigs and
// no path alias; the duplication is sync-by-hand. A test asserts the two
// files match. If you change one, change the other.
//
// Per-kind payload schemas (tool_call args, message_parts payloads, etc.)
// stay z.unknown() in v1.13.11. Frame-level drift detection is the goal;
// deep payload validation is follow-up work.
import { z } from 'zod';
// ---- shared primitives -----------------------------------------------------
const Uuid = z.string().uuid();
// Tool call IDs are model-emitted (e.g. "call_abc123") — not UUIDs.
const ToolCallId = z.string().min(1);
const IsoTimestamp = z.string().min(1);
const ChatStatusValue = z.enum([
'streaming',
'tool_running',
'waiting_for_input',
'idle',
'error',
]);
const ErrorReasonValue = z.enum([
'llm_provider_error',
'doom_loop',
'doom_loop_summary_failed',
'cap_hit',
'cap_hit_summary_failed',
]);
const MessageRoleValue = z.enum(['user', 'assistant', 'system', 'tool']);
const ToolCallShape = z.object({
id: ToolCallId,
name: z.string().min(1),
args: z.record(z.string(), z.unknown()),
});
// Free-form bags: opaque to the frame schema; deep validation is out of
// scope for v1.13.11 (frame-level drift detection is the goal; per-kind
// payload narrowing is follow-up work). z.unknown() means the consumer
// must narrow before reading — TypeScript-side this is fine because every
// consumer already operates on the hand-maintained Project / Chat / Session
// / WorkspacePane types (the brief's "Don't strip existing types yet"
// rule), and the Zod-typed shape is only used at the publishFrame boundary.
const OpaqueObject = z.unknown();
// ---- per-session channel frames --------------------------------------------
export const SnapshotFrame = z.object({
type: z.literal('snapshot'),
messages: z.array(OpaqueObject),
});
export const MessageStartedFrame = z.object({
type: z.literal('message_started'),
message_id: Uuid,
chat_id: Uuid.optional(),
role: MessageRoleValue,
});
export const DeltaFrame = z.object({
type: z.literal('delta'),
message_id: Uuid,
chat_id: Uuid.optional(),
content: z.string(),
});
export const ToolCallFrame = z.object({
type: z.literal('tool_call'),
message_id: Uuid,
chat_id: Uuid.optional(),
tool_call: ToolCallShape,
});
export const ToolResultFrame = z.object({
type: z.literal('tool_result'),
tool_message_id: Uuid,
chat_id: Uuid.optional(),
tool_call_id: ToolCallId,
output: z.unknown(),
truncated: z.boolean(),
error: z.string().optional(),
});
export const MessageCompleteFrame = z.object({
type: z.literal('message_complete'),
message_id: Uuid,
chat_id: Uuid.optional(),
tokens_used: z.number().int().nonnegative().nullable().optional(),
ctx_used: z.number().int().nonnegative().nullable().optional(),
ctx_max: z.number().int().positive().nullable().optional(),
started_at: IsoTimestamp.nullable().optional(),
finished_at: IsoTimestamp.nullable().optional(),
model: z.string().optional(),
metadata: OpaqueObject.nullable().optional(),
});
export const UsageFrame = z.object({
type: z.literal('usage'),
message_id: Uuid,
chat_id: Uuid.optional(),
completion_tokens: z.number().int().nonnegative().nullable(),
ctx_used: z.number().int().nonnegative().nullable(),
ctx_max: z.number().int().positive().nullable(),
});
export const MessagesDeletedFrame = z.object({
type: z.literal('messages_deleted'),
message_ids: z.array(Uuid),
chat_id: Uuid.optional(),
});
export const ChatRenamedFrame = z.object({
type: z.literal('chat_renamed'),
chat_id: Uuid,
name: z.string(),
});
export const CompactedFrame = z.object({
type: z.literal('compacted'),
session_id: Uuid,
chat_id: Uuid,
summary_message_id: Uuid,
});
export const ErrorFrame = z.object({
type: z.literal('error'),
message_id: Uuid.optional(),
chat_id: Uuid.optional(),
error: z.string(),
reason: ErrorReasonValue.optional(),
});
// ---- per-user channel frames (sidebar refresh) -----------------------------
export const ChatStatusFrame = z.object({
type: z.literal('chat_status'),
chat_id: Uuid,
status: ChatStatusValue,
at: IsoTimestamp,
reason: ErrorReasonValue.optional(),
});
export const SessionUpdatedFrame = z.object({
type: z.literal('session_updated'),
session_id: Uuid,
project_id: Uuid,
name: z.string(),
updated_at: IsoTimestamp,
});
export const SessionRenamedFrame = z.object({
type: z.literal('session_renamed'),
session_id: Uuid,
name: z.string(),
});
export const SessionCreatedFrame = z.object({
type: z.literal('session_created'),
session: OpaqueObject,
project_id: Uuid,
});
export const SessionArchivedFrame = z.object({
type: z.literal('session_archived'),
session_id: Uuid,
project_id: Uuid,
});
export const SessionDeletedFrame = z.object({
type: z.literal('session_deleted'),
session_id: Uuid,
project_id: Uuid,
});
export const SessionWorkspaceUpdatedFrame = z.object({
type: z.literal('session_workspace_updated'),
session_id: Uuid,
workspace_panes: z.array(OpaqueObject),
});
export const ChatCreatedFrame = z.object({
type: z.literal('chat_created'),
chat: OpaqueObject,
session_id: Uuid,
});
export const ChatUpdatedFrame = z.object({
type: z.literal('chat_updated'),
chat_id: Uuid,
session_id: Uuid,
name: z.string().nullable(),
updated_at: IsoTimestamp,
});
export const ChatArchivedFrame = z.object({
type: z.literal('chat_archived'),
chat_id: Uuid,
session_id: Uuid,
});
export const ChatUnarchivedFrame = z.object({
type: z.literal('chat_unarchived'),
chat: OpaqueObject,
});
export const ChatDeletedFrame = z.object({
type: z.literal('chat_deleted'),
chat_id: Uuid,
session_id: Uuid,
});
export const ProjectCreatedFrame = z.object({
type: z.literal('project_created'),
project: OpaqueObject,
});
export const ProjectArchivedFrame = z.object({
type: z.literal('project_archived'),
project_id: Uuid,
});
export const ProjectUnarchivedFrame = z.object({
type: z.literal('project_unarchived'),
project: OpaqueObject,
});
export const ProjectUpdatedFrame = z.object({
type: z.literal('project_updated'),
project_id: Uuid,
name: z.string(),
});
export const ProjectDeletedFrame = z.object({
type: z.literal('project_deleted'),
project_id: Uuid,
});
// ---- discriminated union ---------------------------------------------------
export const WsFrameSchema = z.discriminatedUnion('type', [
// per-session
SnapshotFrame,
MessageStartedFrame,
DeltaFrame,
ToolCallFrame,
ToolResultFrame,
MessageCompleteFrame,
UsageFrame,
MessagesDeletedFrame,
ChatRenamedFrame,
CompactedFrame,
ErrorFrame,
// per-user
ChatStatusFrame,
SessionUpdatedFrame,
SessionRenamedFrame,
SessionCreatedFrame,
SessionArchivedFrame,
SessionDeletedFrame,
SessionWorkspaceUpdatedFrame,
ChatCreatedFrame,
ChatUpdatedFrame,
ChatArchivedFrame,
ChatUnarchivedFrame,
ChatDeletedFrame,
ProjectCreatedFrame,
ProjectArchivedFrame,
ProjectUnarchivedFrame,
ProjectUpdatedFrame,
ProjectDeletedFrame,
]);
export type WsFrame = z.infer<typeof WsFrameSchema>;
// Convenience: the set of known frame types. Useful for the publishFrame
// helper to log the offending type name when validation fails. Kept in sync
// by hand with the discriminated union above.
export const KNOWN_FRAME_TYPES: readonly WsFrame['type'][] = [
'snapshot',
'message_started',
'delta',
'tool_call',
'tool_result',
'message_complete',
'usage',
'messages_deleted',
'chat_renamed',
'compacted',
'error',
'chat_status',
'session_updated',
'session_renamed',
'session_created',
'session_archived',
'session_deleted',
'session_workspace_updated',
'chat_created',
'chat_updated',
'chat_archived',
'chat_unarchived',
'chat_deleted',
'project_created',
'project_archived',
'project_unarchived',
'project_updated',
'project_deleted',
] as const;

View File

@@ -1,8 +1,8 @@
import { useEffect, useState } from 'react';
import { useEffect, useMemo, useState } from 'react';
import { Check, ChevronDown } from 'lucide-react';
import { toast } from 'sonner';
import { api } from '@/api/client';
import type { Agent, AgentParseError } from '@/api/types';
import type { Agent, AgentParseError, ToolCostStat } from '@/api/types';
import {
DropdownMenu,
DropdownMenuContent,
@@ -22,6 +22,10 @@ export function AgentPicker({ projectId, value, onChange }: Props) {
const [parseErrors, setParseErrors] = useState<AgentParseError[]>([]);
const [error, setError] = useState<string | null>(null);
const [open, setOpen] = useState(false);
// v1.13.10: per-tool cost rolling window. Fetched once on mount; would
// refresh on remount or page reload. Acceptable for a decision aid — the
// 100-call rolling mean doesn't shift fast.
const [costStats, setCostStats] = useState<ToolCostStat[]>([]);
// v1.8.1: per-agent parse errors are non-blocking. Silent if any agents
// loaded successfully; a gray warning toast fires only when EVERY agent
@@ -52,6 +56,29 @@ export function AgentPicker({ projectId, value, onChange }: Props) {
};
}, [projectId]);
// v1.13.10: cost stats are project-independent — the 100-call rolling
// window is global across all chats. Fetch once per mount; tolerate failure
// silently (cost line hides).
useEffect(() => {
let cancelled = false;
api.tools
.costStats()
.then((r) => {
if (!cancelled) setCostStats(r.stats);
})
.catch(() => {
if (!cancelled) setCostStats([]);
});
return () => {
cancelled = true;
};
}, []);
const costByTool = useMemo(
() => Object.fromEntries(costStats.map((s) => [s.tool_name, s])),
[costStats],
);
const selectedAgent = agents?.find((a) => a.id === value) ?? null;
const triggerLabel = value === null
? 'No agent'
@@ -86,25 +113,33 @@ export function AgentPicker({ projectId, value, onChange }: Props) {
<span className="font-medium">No agent</span>
</DropdownMenuItem>
{agents.length > 0 && <DropdownMenuSeparator />}
{agents.map((a) => (
<DropdownMenuItem
key={a.id}
onSelect={() => void onChange(a.id)}
className="text-xs flex-col items-start gap-0.5"
>
<div className="flex items-center gap-1.5">
<Check
className={`size-3 ${a.id === value ? 'opacity-100' : 'opacity-0'}`}
/>
<span className="font-medium">{a.name}</span>
</div>
{a.description && (
<span className="text-muted-foreground pl-[18px] truncate w-full">
{a.description}
</span>
)}
</DropdownMenuItem>
))}
{agents.map((a) => {
const cost = agentCost(a, costByTool);
return (
<DropdownMenuItem
key={a.id}
onSelect={() => void onChange(a.id)}
className="text-xs flex-col items-start gap-0.5"
>
<div className="flex items-center gap-1.5">
<Check
className={`size-3 ${a.id === value ? 'opacity-100' : 'opacity-0'}`}
/>
<span className="font-medium">{a.name}</span>
</div>
{a.description && (
<span className="text-muted-foreground pl-[18px] truncate w-full">
{a.description}
</span>
)}
{cost.nWithData > 0 && (
<span className="text-muted-foreground/70 pl-[18px] truncate w-full">
~{formatK(cost.prompt)} prompt / {cost.completion} completion · {cost.nWithData}/{cost.nTools} tools{cost.mostRecent ? ` · last call ${formatAgo(cost.mostRecent)}` : ''}
</span>
)}
</DropdownMenuItem>
);
})}
{parseErrors.length > 0 && (
<div
className="px-2 py-1.5 mt-1 text-xs text-amber-500 border-t border-border"
@@ -119,3 +154,49 @@ export function AgentPicker({ projectId, value, onChange }: Props) {
</DropdownMenu>
);
}
// v1.13.10: sum the per-tool means across an agent's whitelisted tools.
// Sum-of-means, not mean-of-sums — we're combining independent rolling
// averages. nWithData reflects how many of the agent's tools have any
// history yet; the line hides entirely when zero so a fresh deploy doesn't
// render "0k / 0 / 0 tools".
function agentCost(
agent: Agent,
costByTool: Record<string, ToolCostStat>,
): {
prompt: number;
completion: number;
nTools: number;
nWithData: number;
mostRecent: string | null;
} {
let prompt = 0;
let completion = 0;
let nWithData = 0;
let mostRecent: string | null = null;
for (const t of agent.tools) {
const s = costByTool[t];
if (!s) continue;
prompt += s.mean_prompt_tokens;
completion += s.mean_completion_tokens;
nWithData++;
if (!mostRecent || s.updated_at > mostRecent) mostRecent = s.updated_at;
}
return { prompt, completion, nTools: agent.tools.length, nWithData, mostRecent };
}
function formatK(n: number): string {
if (n < 1000) return String(n);
if (n < 10_000) return `${(n / 1000).toFixed(1)}k`;
return `${Math.round(n / 1000)}k`;
}
function formatAgo(iso: string): string {
const then = new Date(iso).getTime();
if (Number.isNaN(then)) return '—';
const diff = Date.now() - then;
if (diff < 60_000) return 'just now';
if (diff < 3_600_000) return `${Math.round(diff / 60_000)}m ago`;
if (diff < 86_400_000) return `${Math.round(diff / 3_600_000)}h ago`;
return `${Math.round(diff / 86_400_000)}d ago`;
}

View File

@@ -1,6 +1,7 @@
import { useEffect, useRef, useState } from 'react';
import { toast } from 'sonner';
import type { Message, WsFrame } from '@/api/types';
import { WsFrameSchema } from '@/api/ws-frames';
import { api } from '@/api/client';
import { sessionEvents } from './sessionEvents';
import { recordUsage } from './useChatThroughput';
@@ -216,8 +217,28 @@ export function useSessionStream(sessionId: string | undefined) {
setState((s) => ({ ...s, connected: true, error: null }));
};
ws.onmessage = (ev) => {
// v1.13.11-a: Zod-validate every inbound frame. Fail-closed — invalid
// frames are logged and dropped. WsFrameSchema is the runtime guard;
// the hand-maintained WsFrame type stays as the narrowed dev-time
// shape (Zod uses OpaqueObject for nested types like Message[]). One
// cast bridges the two.
let raw: unknown;
try {
const frame = JSON.parse(typeof ev.data === 'string' ? ev.data : '') as WsFrame;
raw = JSON.parse(typeof ev.data === 'string' ? ev.data : '');
} catch (err) {
console.warn('bad ws frame (parse)', err);
return;
}
const validated = WsFrameSchema.safeParse(raw);
if (!validated.success) {
console.error('ws-frame-validation-failed (session channel)', {
frame_type: (raw as { type?: unknown })?.type,
errors: validated.error.flatten(),
});
return;
}
try {
const frame = validated.data as unknown as WsFrame;
// v1.11: on a compaction completion, re-fetch the message list so
// the new summary row + the cohort of compacted_at-stamped older
// rows render correctly. We dispatch the fresh list as a synthetic

View File

@@ -1,4 +1,5 @@
import { useEffect } from 'react';
import { WsFrameSchema } from '@/api/ws-frames';
import { sessionEvents } from './sessionEvents';
import { createWsReconnectToast } from './wsReconnectToast';
@@ -38,14 +39,33 @@ export function useUserEvents(): void {
};
ws.onmessage = (ev) => {
// v1.13.11-a: Zod-validate every inbound frame. Fail-closed — invalid
// frames are logged and dropped instead of dispatched onto the
// sessionEvents bus where a stale or wrong shape would silently
// corrupt sidebar / chat state.
let raw: unknown;
try {
const parsed: unknown = JSON.parse(ev.data);
if (parsed && typeof (parsed as { type?: unknown }).type === 'string') {
sessionEvents.emit(parsed as import('./sessionEvents').SessionEvent);
}
raw = JSON.parse(ev.data);
} catch (err) {
console.warn('useUserEvents: failed to parse frame', err);
return;
}
const validated = WsFrameSchema.safeParse(raw);
if (!validated.success) {
console.error('ws-frame-validation-failed (user channel)', {
frame_type: (raw as { type?: unknown })?.type,
errors: validated.error.flatten(),
});
return;
}
// Bridge cast: Zod's union is broader than SessionEvent (it includes
// per-session-channel frames too, which never arrive on the user
// channel). sessionEvents.emit only dispatches frames whose type
// appears in SessionEvent; the narrowing happens via the existing
// useSidebar.ts applyEvent switch.
sessionEvents.emit(
validated.data as unknown as import('./sessionEvents').SessionEvent,
);
};
ws.onclose = () => {

38
openspec/README.md Normal file
View File

@@ -0,0 +1,38 @@
# openspec
Per-batch documentation convention adopted v1.13.15-openspec.
Lift source: Fission-AI/OpenSpec directory layout. **No CLI dependency** — just
the folder shape. Full OpenSpec lifecycle adoption is a future v1.14+ batch.
## Layout
```
openspec/
changes/
<slug>/ # one folder per shipped or planned batch
proposal.md # Why + scope summary
tasks.md # implementation step list
design.md # architecture / data-model decisions (optional)
specs/ # reserved for future OpenSpec CLI adoption
archived/ # snapshots of pre-v1.13.15 batch docs
<original-filename>.md
specs/ # global specs, future v1.14+ use
```
## Conventions
- Slugs are lowercase-hyphenated derived from the batch title
(e.g. `v1-13-10-per-tool-cost`, `file-attachments-v3-5`).
- Already-shipped pre-v1.13.15 batches live in `changes/archived/` as
single-file snapshots. They were not split into proposal/tasks because
the work was already complete; archiving preserves git history.
- New v1.13.15+ batches should land directly in
`changes/<slug>/proposal.md` (+ tasks.md, + design.md when applicable).
- `proposal.md` carries the "Why" and scope. `tasks.md` is the action list
(numbered or checkbox). `design.md` is for non-trivial architectural
decisions worth recording separately.
- A canonical dispatch brief (matching the v1.13.9 / v1.13.10 format)
is most naturally split as proposal.md (Where we are, Why this matters,
rationale sections) + tasks.md (Scope items, Build + smoke) + design.md
(Attribution model, Filtering, Canonical mapping).

View File

@@ -0,0 +1,441 @@
```
#careful #boocode #nofluff
v1.13.10 — per-tool token cost accounting (rolling 100-call window)
Goal: surface per-tool prompt/completion-token rolling averages in AgentPicker for at-a-glance agent-cost hints. Implementation is a SQL view on top of `messages_with_parts` (no new table, no new write site) + a read endpoint + AgentPicker tooltip extension. Estimated ~240 LoC, mostly UI.
## Where we are
- Last tag: v1.13.9 (compaction overflow trigger — `floor(0.85 × ctx_max)` early-trigger). Branch clean.
- v1.13.x cleanup line ✅ through v1.13.9. Queued: v1.13.10 (this) → v1.13.11 (WS Zod) → v1.13.12 (skills audit) → v1.13.2 (column drop, last).
- Dependency (satisfied since v1.13.7 commit `ff29b48`): `includeUsage: true` on `createOpenAICompatible` in `apps/server/src/services/inference/provider.ts`. Without it, `messages.tokens_used`/`ctx_used` were NULL for v1.13.1-A → v1.13.7 (latent regression). Now populated.
## Why this matters
Today: AgentPicker lists agents by name + description. No cost signal. Users pick the architect agent (full tool whitelist, 21k of tool schema) for one-liner questions a refactorer (3 tools, 4k schema) could answer.
Tomorrow: each agent listing shows its mean prompt + completion cost per tool, derived from the last 100 invocations across all chats. Decision aid, not a hard gate.
Why a SQL view instead of a denormalized stats table:
- All the source data already lands in `messages` (tool_calls JSON + tokens_used + ctx_used) and `message_parts` (read via the `messages_with_parts` view). Zero new write sites.
- Rolling 100-call window is a `ROW_NUMBER() OVER (PARTITION BY tool_name ORDER BY created_at DESC) <= 100` — natural fit for a view.
- View is rollback-safe. If the math is wrong, `DROP VIEW` and re-deploy; no orphan rows, no backfill.
- At BooCode scale (single user, ~30 tools, ~100 calls/tool), aggregate-on-read is microseconds. Premature to denormalize.
The roadmap schema row (`tool_cost_stats (tool_name, prompt_tokens_sum, completion_tokens_sum, n_calls, updated_at)`) matches both a table and a view. View is the lighter implementation.
## Canonical column mapping (pinned)
The `messages` columns are named non-obviously. Pinned mapping, confirmed across 5 write sites + 1 read site:
| Column | Semantic meaning | AI SDK v6 source name |
|-----------------|--------------------|-----------------------|
| `ctx_used` | prompt / input tokens | `usage.inputTokens` |
| `tokens_used` | completion / output tokens | `usage.outputTokens` |
Write sites confirmed: `tool-phase.ts:94-95`, `error-handler.ts:109-110`, `sentinel-summaries.ts:130-131`, `sentinel-summaries.ts:387-388`, `stream-phase.ts:319-320`. Canonical read at `payload.ts:190-191` reverses: `const promptTokens = updated.ctx_used; const completionTokens = updated.tokens_used`.
`tokens_used` reads like "total" but is completion only. Project convention since the columns predate v1.13.x. Do not "fix" the naming inside this batch — out of scope; downstream consumers depend on the current mapping.
## Attribution model
A single assistant turn can emit N tool calls in parallel. llama-swap returns ONE (prompt_tokens, completion_tokens) per turn, not per tool. Attribution requires a split.
**Chosen approach: equal split.** For an assistant turn that emits N tool calls with prompt P and completion C, each tool is attributed P/N prompt + C/N completion. The 100-call rolling mean smooths split noise. Implementation: `tokens_used::float / jsonb_array_length(tool_calls)` at the unnest site.
**Alternatives rejected:**
- "Full turn cost to every tool" (no division). Over-states; a 5-tool turn would 5×-count every tool's cost.
- "Result-size only" (`length(JSON.stringify(output)) / 4`). Loses the LLM's actual usage signal; doesn't capture how expensive a tool's output is to the next prompt.
- "Consuming-turn delta" (next turn prompt_tokens this turn prompt_tokens, attribute to the tool that emitted the result). Most accurate but requires bubble-back math through the `executeToolPhase → runAssistantTurn` recursion. Over-engineered for the rolling-average use case.
**If Sam wants a different split, change one line in the view definition (the divisor).**
## Filtering — sentinel, failure, repair-call semantics
The view excludes rows that aren't real tool-cost signal:
- **Failed and cancelled turns** (`status != 'complete'`). The `error-handler.ts` failed/cancelled paths don't write `tokens_used`/`ctx_used`, so the existing `tokens_used IS NOT NULL` clause already filters these. Adding `status='complete'` is defense in depth and makes intent explicit.
- **Cap-hit and doom-loop sentinel rows** (`metadata->>'kind' IN ('cap_hit', 'doom_loop')`). Sentinels are `role='system'` rows with `tool_calls=NULL`, so the existing `tool_calls IS NOT NULL` clause already filters them. The explicit metadata filter is defense in depth — it survives future schema drift where someone might INSERT a sentinel with a non-null tool_calls.
- **`experimental_repairToolCall` retries.** No special handling needed. Our impl (per `CLAUDE.md`) is pass-through — malformed calls flow to zod-reject → tool_result error → next normal turn handles. No separate rows; the next turn's tokens count naturally.
## Recon (already done; paste for reference)
```
cd /opt/boocode
grep -n "tokens_used\|ctx_used\|inputTokens\|outputTokens" apps/server/src/services/inference/*.ts | head -30
grep -n "metadata\|cap_hit\|doom_loop" apps/server/src/services/inference/sentinels.ts apps/server/src/schema.sql | head -10
psql -h localhost -p 5432 -U postgres -d boocode -c "\d messages_with_parts" | head -30
```
Expected: confirms the canonical mapping in the table above; confirms `messages.metadata jsonb` exists at `schema.sql:259`; confirms `messages_with_parts` exposes `m.metadata` at `schema.sql:92`.
## Scope
### 1. schema.sql — `tool_cost_stats` view (~35 LoC)
Append after the `messages_with_parts` view (after line 120):
```sql
-- v1.13.10: per-tool token cost rolling window. Derives from
-- messages_with_parts (the v1.13.1-B view that COALESCEs message_parts over
-- the legacy JSON column) so this works whether the chat predates v1.13.0
-- or postdates v1.13.2 (column drop). No new write site — all source data
-- already lands via the existing tool-phase.ts:94-95 UPDATE.
--
-- Attribution model: equal split. A turn emitting N tool calls divides its
-- prompt/completion tokens by N before attribution. See v1.13.10 dispatch
-- brief for rationale + rejected alternatives.
--
-- Column mapping: messages.ctx_used = prompt (input), messages.tokens_used
-- = completion (output). Non-obvious naming; pinned via canonical writes at
-- tool-phase.ts:94-95 et al.
--
-- Filtering rationale:
-- status='complete' — exclude failed/cancelled (defense in
-- depth; failed-path doesn't write
-- tokens_used so they're also filtered
-- indirectly).
-- metadata->>'kind' exclusions — exclude cap_hit / doom_loop sentinels
-- (defense in depth; sentinels are
-- role='system' with tool_calls=NULL
-- so they're filtered indirectly too).
-- experimental_repairToolCall — no special handling; retries flow
-- as normal next-turn tool_result
-- errors and count naturally.
--
-- Rolling window: last 100 calls per tool_name, ordered by created_at DESC.
-- Aggregate-on-read is microseconds at BooCode scale (single user, ~30
-- tools, < 100 calls each). DROP VIEW + recreate to change window size.
CREATE OR REPLACE VIEW tool_cost_stats AS
WITH per_call AS (
SELECT
(tc->>'name')::text AS tool_name,
(m.ctx_used::float / NULLIF(jsonb_array_length(m.tool_calls), 0)) AS prompt_tokens,
(m.tokens_used::float / NULLIF(jsonb_array_length(m.tool_calls), 0)) AS completion_tokens,
m.created_at,
ROW_NUMBER() OVER (
PARTITION BY (tc->>'name')::text
ORDER BY m.created_at DESC
) AS rn
FROM messages_with_parts m,
LATERAL jsonb_array_elements(m.tool_calls) AS tc
WHERE m.tool_calls IS NOT NULL
AND jsonb_array_length(m.tool_calls) > 0
AND m.tokens_used IS NOT NULL
AND m.ctx_used IS NOT NULL
AND m.status = 'complete'
AND (m.metadata IS NULL
OR m.metadata->>'kind' IS NULL
OR m.metadata->>'kind' NOT IN ('cap_hit', 'doom_loop'))
)
SELECT
tool_name,
ROUND(SUM(prompt_tokens))::int AS prompt_tokens_sum,
ROUND(SUM(completion_tokens))::int AS completion_tokens_sum,
COUNT(*)::int AS n_calls,
MAX(created_at) AS updated_at
FROM per_call
WHERE rn <= 100
GROUP BY tool_name;
```
Notes:
- `NULLIF(..., 0)` guards against div-by-zero on `jsonb_array_length=0` (should never happen given the WHERE clause, but defensive).
- `ROUND(SUM(...))::int` — frontend doesn't want decimals; sum-then-round is more accurate than per-row round-then-sum.
- View is read from `messages_with_parts` not `messages`, so legacy pre-v1.13.0 rows and post-v1.13.2 rows both resolve.
- No index needed; the underlying `idx_messages_chat` covers the JOIN; the LATERAL unnest is bounded by the 100-row partition.
### 2. apps/server/src/routes/tools.ts (NEW, ~40 LoC)
New route file. Register in `apps/server/src/index.ts` next to the other `register*Routes(app, sql, ...)` calls.
```ts
import type { FastifyInstance } from 'fastify';
import type { Sql } from '../db.js';
export interface ToolCostStat {
tool_name: string;
mean_prompt_tokens: number;
mean_completion_tokens: number;
n_calls: number;
updated_at: string;
}
export function registerToolsRoutes(app: FastifyInstance, sql: Sql) {
app.get('/api/tools/cost_stats', async () => {
const rows = await sql<{
tool_name: string;
prompt_tokens_sum: number;
completion_tokens_sum: number;
n_calls: number;
updated_at: string;
}[]>`
SELECT tool_name, prompt_tokens_sum, completion_tokens_sum, n_calls, updated_at
FROM tool_cost_stats
ORDER BY tool_name ASC
`;
const stats: ToolCostStat[] = rows.map(r => ({
tool_name: r.tool_name,
mean_prompt_tokens: Math.round(r.prompt_tokens_sum / r.n_calls),
mean_completion_tokens: Math.round(r.completion_tokens_sum / r.n_calls),
n_calls: r.n_calls,
updated_at: r.updated_at,
}));
return { stats };
});
}
```
Route is bodyless, idempotent, cheap. No pagination (≤30 tools).
### 3. apps/server/src/services/__tests__/tool_cost_stats.test.ts (NEW, ~95 LoC)
Integration test against real Postgres (matches `inference.test.ts` pattern). Fixtures:
```ts
import { describe, it, expect, beforeEach } from 'vitest';
import { connect } from '../../db.js';
describe('tool_cost_stats view (v1.13.10)', () => {
// ... session + chat + project setup helpers ...
it('returns empty when no tool calls exist', async () => {
// fresh chat, only user/assistant text turns
const stats = await sql`SELECT * FROM tool_cost_stats`;
expect(stats).toEqual([]);
});
it('attributes single-tool turn fully to that tool', async () => {
// insert one assistant message with tool_calls=[{name: 'view_file', ...}],
// tokens_used=300, ctx_used=15000, status='complete'
const stats = await sql`SELECT * FROM tool_cost_stats WHERE tool_name='view_file'`;
expect(stats[0]).toMatchObject({
tool_name: 'view_file',
prompt_tokens_sum: 15000,
completion_tokens_sum: 300,
n_calls: 1,
});
});
it('splits multi-tool turn equally across tools', async () => {
// insert one assistant turn with 3 tool calls (view_file, grep, list_dir),
// tokens_used=300, ctx_used=15000 → each tool gets 100 completion, 5000 prompt
const stats = await sql`SELECT * FROM tool_cost_stats ORDER BY tool_name`;
expect(stats).toHaveLength(3);
for (const s of stats) {
expect(s.completion_tokens_sum).toBe(100);
expect(s.prompt_tokens_sum).toBe(5000);
expect(s.n_calls).toBe(1);
}
});
it('limits to last 100 calls per tool (FIFO window)', async () => {
// insert 150 turns each calling view_file once with monotonically
// increasing tokens_used; expect only the most recent 100 to count
const stats = await sql`SELECT * FROM tool_cost_stats WHERE tool_name='view_file'`;
expect(stats[0]!.n_calls).toBe(100);
// mean should reflect the latter half (51..150), not 1..150
});
it('excludes turns with NULL tokens_used (pre-v1.13.7 latent regression)', async () => {
// insert a turn with tool_calls but tokens_used=NULL → must not appear
const stats = await sql`SELECT * FROM tool_cost_stats WHERE tool_name='view_file'`;
expect(stats).toEqual([]);
});
it('excludes failed and cancelled turns + sentinel metadata rows', async () => {
// insert four rows for tool_name='view_file', all with tokens_used+ctx_used
// populated:
// row A: status='failed' — excluded
// row B: status='cancelled' — excluded
// row C: status='complete', metadata={kind:'cap_hit'} — excluded
// row D: status='complete', metadata={kind:'doom_loop'} — excluded
// row E: status='complete', metadata=null — included
// Expect n_calls=1, attributable to row E only.
const stats = await sql`SELECT * FROM tool_cost_stats WHERE tool_name='view_file'`;
expect(stats[0]!.n_calls).toBe(1);
});
it('reads tool_calls via messages_with_parts (parts-authoritative)', async () => {
// insert a v1.13.0+ row with messages.tool_calls=NULL but
// message_parts rows containing the tool_call → must still aggregate
const stats = await sql`SELECT * FROM tool_cost_stats WHERE tool_name='grep'`;
expect(stats[0]!.n_calls).toBe(1);
});
});
```
Pattern: each test resets the messages table for the fixture chat (TRUNCATE not DELETE — Postgres `messages` has FK CASCADE) and inserts hand-crafted rows. The view is recomputed on every SELECT.
### 4. apps/web/src/api/types.ts + client.ts (~10 LoC)
Add to `types.ts`:
```ts
export interface ToolCostStat {
tool_name: string;
mean_prompt_tokens: number;
mean_completion_tokens: number;
n_calls: number;
updated_at: string;
}
```
Add to `client.ts` under the existing `api.*` namespace structure:
```ts
tools: {
costStats: () => fetch<{ stats: ToolCostStat[] }>('GET', '/api/tools/cost_stats'),
},
```
Match the casing convention of the existing namespaces (`api.agents.list`, `api.chats.archive`, etc.).
### 5. apps/web/src/components/AgentPicker.tsx — tooltip extension (~80 LoC delta)
Currently (line 67): `title={selectedAgent?.description}` — native HTML title attribute on the trigger button.
Replacement: dropdown items get a per-agent cost line in muted text below the description. Format:
```
[Agent name]
[Agent description]
~5.2k prompt / 280 completion · 6 tools · last call 3h ago
```
Implementation steps:
1. Fetch `api.tools.costStats()` once on mount (alongside the existing `api.agents.list()`). Cache result for the lifetime of the picker open state. Re-fetch only on `useEffect` dep change.
2. Compute per-agent aggregate: for each agent, sum the means of its whitelisted tools. Sum-of-means, not mean-of-sums — we're combining independent rolling averages.
3. Render below description (one line, muted, truncated). Show "—" if no calls recorded yet for any of the agent's tools.
4. Don't break the existing native `title=` for backward compat; layer the cost line additively.
```tsx
const [costStats, setCostStats] = useState<ToolCostStat[]>([]);
useEffect(() => {
api.tools.costStats().then(r => setCostStats(r.stats)).catch(() => setCostStats([]));
}, []);
const costByTool = useMemo(
() => Object.fromEntries(costStats.map(s => [s.tool_name, s])),
[costStats],
);
function agentCost(agent: Agent): { prompt: number; completion: number; nTools: number; nWithData: number; mostRecent: string | null } {
let prompt = 0, completion = 0, nWithData = 0;
let mostRecent: string | null = null;
for (const t of agent.tools) {
const s = costByTool[t];
if (!s) continue;
prompt += s.mean_prompt_tokens;
completion += s.mean_completion_tokens;
nWithData++;
if (!mostRecent || s.updated_at > mostRecent) mostRecent = s.updated_at;
}
return { prompt, completion, nTools: agent.tools.length, nWithData, mostRecent };
}
```
For the line render: `~${formatK(prompt)} prompt / ${completion} completion · ${nWithData}/${nTools} tools · ${formatAgo(mostRecent)}`. Skip entirely when `nWithData === 0` to avoid showing "0k / 0 / 0 tools" for fresh-from-deploy state.
**`formatK` / `formatAgo`:** colocate at the bottom of `AgentPicker.tsx`. Don't extract to a util file in this batch — single use site.
## What NOT to do
- **Don't add a new write site at `tool-phase.ts` or `finalizeCompletion`.** All source data is already there via existing UPDATEs.
- **Don't denormalize.** The view is sufficient and rollback-safe at BooCode's single-user scale.
- **Don't add per-tool cost to the message bubble.** Out of scope. AgentPicker tooltip only.
- **Don't fold per-call rows into a moving sum via triggers.** Aggregate on read; 100 rows × 30 tools is microseconds in Postgres.
- **Don't track `result_chars` (the size of `tool_results.output`).** Tempting as a second cost signal but out of scope here. Future batch if Sam wants it.
- **Don't add a session-scoped or chat-scoped filter to `tool_cost_stats`.** The rolling window is GLOBAL across all chats — the agent picker is a project-level decision aid. Per-chat surfacing is a future v1.14+ design.
- **Don't change the attribution model post-deployment** without dropping the view first. Mid-flight semantic changes give bogus historical means.
- **Don't "fix" the `ctx_used`/`tokens_used` naming inside this batch.** Non-obvious but pinned across 5 write sites. Renaming is its own batch.
- **Don't rely solely on `tool_calls IS NOT NULL` for sentinel exclusion.** It works today (sentinels are role='system' with tool_calls=NULL) but the explicit `status='complete'` + `metadata->>'kind'` filters are defense in depth and survive future schema drift.
## Backup before edits
```
cd /opt/boocode
cp apps/server/src/schema.sql{,.bak-$(date +%Y%m%d-%H%M%S)}
cp apps/web/src/components/AgentPicker.tsx{,.bak-$(date +%Y%m%d-%H%M%S)}
```
(No backup needed for new files in items 2, 3, 4.)
## Verify
```
pnpm -C apps/server test
```
Expected: all existing tests pass + 7 new in `tool_cost_stats.test.ts`. Total moves from 195 → 202.
```
cd /opt/boocode
docker compose exec boocode_db psql -U postgres -d boocode -c \
"SELECT * FROM tool_cost_stats ORDER BY n_calls DESC LIMIT 10;"
```
Expected: in any live deployment with v1.13.7+ history, this returns real rows for `view_file`, `grep`, `list_dir`, etc. If empty: `messages.tool_calls` was NULL for the v1.13.1-A → v1.13.7 latent regression window and recovery only begins with v1.13.7+ traffic.
## Build + smoke
```
cd /opt/boocode
docker compose up --build -d boocode
docker compose logs --since=30s boocode | tail -20
```
Smoke A — view recompiles on schema apply:
```
docker compose logs boocode | grep -i "tool_cost_stats\|applySchema"
```
Expected: clean schema apply, view registered idempotently.
Smoke B — endpoint returns data:
```
curl -s http://localhost:3000/api/tools/cost_stats | jq '.stats | length, .stats[0]'
```
Expected: nonzero length if any v1.13.7+ tool calls exist; one stat object with all 5 fields populated.
Smoke C — UI:
1. Open browser to `boocode.indifferentketchup.com`.
2. Open AgentPicker dropdown on any session.
3. Each agent row shows a muted cost line below its description: `~5.2k prompt / 280 completion · 6/8 tools · last call 2h ago`.
4. Agents with no tool history show just description (no cost line).
5. Confirm cost line truncates with the existing text-muted-foreground / truncate pattern; doesn't break the layout at mobile widths (open Vivaldi devtools, set iPhone-13 viewport).
## Files expected to touch
- `apps/server/src/schema.sql` — ~35 LoC delta (view definition + filter comments)
- `apps/server/src/routes/tools.ts` — NEW, ~40 LoC
- `apps/server/src/index.ts` — 1 line (`registerToolsRoutes(app, sql)`)
- `apps/server/src/services/__tests__/tool_cost_stats.test.ts` — NEW, ~95 LoC
- `apps/web/src/api/types.ts` — ~7 LoC (interface)
- `apps/web/src/api/client.ts` — ~3 LoC (namespace + method)
- `apps/web/src/components/AgentPicker.tsx` — ~80 LoC delta (cost line + fetch hook + helpers)
Total ~260 LoC. Matches roadmap estimate.
## Workflow conventions
- Backups before destructive edits (above) on the two MODIFIED files. New files don't need backups.
- Sam reviews diffs. Never `git add` / `git commit` / `git push` / `git pull` on Sam's behalf.
- Build: `docker compose up --build -d boocode`. No `--no-cache` unless layer-cache trap surfaces.
- Tests authoritative: `pnpm -C apps/server test`.
- View definition lives in `schema.sql` (idempotent via `CREATE OR REPLACE VIEW`); no migration shim needed.
## Don't repeat past mistakes
- v1.13.7 stability bundle (`includeUsage:true`, trim guards, payload filter, `BUDGET_NO_AGENT=30`): all live. This batch depends on `includeUsage:true`. If unset, `tool_cost_stats` returns empty rows.
- v1.13.8 prefix instrumentation: untouched.
- v1.13.9 ratio-only `usable()`: untouched.
- v1.13.4 two-tier prune: untouched.
- v1.13.5 truncate.ts opaque-id pattern: untouched.
- v1.13.1-B `messages_with_parts` view: this view is the source. Don't reach past it to raw `messages`.
- v1.13.2 will DROP `messages.tool_calls`/`tool_results` columns. The `tool_cost_stats` view reads from `messages_with_parts` not `messages`, so it survives. Verify after v1.13.2 ships.
## Source files to read in project knowledge
- `boocode_roadmap.md` (v1.13.10 row at line 114; schema row at line 474)
- `boocode_code_review.md` (cost-tracking design background)
- `CLAUDE.md` (project conventions; messages_with_parts invariant at L80; v1.13.7 includeUsage invariant)
```

3
pnpm-lock.yaml generated
View File

@@ -157,6 +157,9 @@ importers:
tw-animate-css:
specifier: ^1.4.0
version: 1.4.0
zod:
specifier: ^3.23.8
version: 3.25.76
devDependencies:
'@tailwindcss/postcss':
specifier: ^4.3.0