Commit Graph

28 Commits

Author SHA1 Message Date
2e1a81de72 v1.13.16-xml-parser: Anthropic <invoke> support + unknown-tool recovery hints
Two-part fix for the model-emitted XML drift the v1.13.15-codecontext-synth
investigation surfaced (1 raw <invoke> leak observed out of 190 qwen3.6
turns — qwen3.6-35b-a3b-mxfp4 drifts to the Anthropic format when prompted
as an Architect-style agent because Claude Code documentation in its
pre-training corpus uses that shape).

## Parser extension

xml-parser.ts now recognizes BOTH XML tool-call flavors:

  - Qwen/Hermes:   <tool_call><function=NAME>...<parameter=K>V</parameter>...</function></tool_call>
  - Anthropic:     <invoke name="NAME"><parameter name="K">V</parameter></invoke>

Both route through the same synthetic-id xml_call_${idx} ToolCall path.
extractToolCallBlocks() and partialXmlOpenerStart() handle both openers
(<tool_call> and <invoke...) so partial buffers don't get prematurely
flushed during streaming.

The existing Qwen parser was tightened to tolerate whitespace around `=`
(<function = name>, <parameter = key>...) so a stray space doesn't get
absorbed into the function name. Name capture is non-whitespace,
non-`>`.

## Unknown-tool recovery hint

New tool-suggestions.ts exports levenshtein() + suggestToolName() +
formatUnknownToolError(). When tool-phase.ts:executeToolCall receives a
toolCall.name that isn't in TOOLS_BY_NAME, the error returned to the
model now includes a "Did you mean: X?" hint based on Levenshtein
distance ≤3 or substring match against Object.keys(TOOLS_BY_NAME).
Targets the qwen3.6 drift to read_file → suggest view_file. Applies to
all unknown tool names, not just <invoke>-derived ones — at the
dispatch layer we no longer know which format produced the call, and
the extra signal is harmless for Qwen-derived calls.

## Test coverage

xml-parser.test.ts: 46 tests, all green. Covers both parsers
(well-formed, malformed, multi-parameter, nested-content), the
partial-opener detector for both flavors, the unified extraction
helper, and the unknown-tool error formatter.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 20:59:25 +00:00
8b568b36d3 v1.13.11-a: WS frame schemas + frontend receive validation
First half of the WebSocket-frame-typing batch (split per recon — total
scope was ~535 LoC, larger than the roadmap's ~300 estimate, so the
server-side publish-site conversion lands separately in v1.13.11-b).

Phase A scope:

(1) apps/server/src/types/ws-frames.ts (NEW) — Zod schemas for all 27
wire-format WS frame types. Discriminated union (WsFrameSchema) plus
KNOWN_FRAME_TYPES const for diagnostic lookup. UUIDs are z.string().
uuid(); model-emitted tool_call_id stays z.string().min(1) since OpenAI-
compatible APIs emit "call_<random>" not UUID. Per-kind payload narrowing
(tool args, message_parts payloads) intentionally stays z.unknown() —
frame-level drift detection is the goal; deep payload validation is
follow-up work.

(2) apps/web/src/api/ws-frames.ts (NEW) — byte-identical mirror of the
authoritative server file. No path alias from web→server in the existing
tsconfig setup; sync-by-hand was chosen over a new packages/shared/ dir.
A ws-frames.test.ts test asserts the two files match.

(3) apps/server/src/services/broker.ts — adds publishFrame() and
publishUserFrame() methods to the Broker interface. Both validate via
WsFrameSchema and fail-closed: log + drop on invalid. createBroker now
accepts an optional FastifyBaseLogger so validation failures land in
the pino stream (with console.error fallback for unit tests). The
existing publish() / publishUser() raw methods stay legal — they get
converted to the typed variants in v1.13.11-b.

(4) apps/web/src/hooks/useSessionStream.ts + useUserEvents.ts — wrap
ws.onmessage with WsFrameSchema.safeParse. Fail-closed: invalid frames
log + return without dispatching. Hand-maintained WsFrame and
SessionEvent types stay in place; one cast bridges Zod-typed → narrowed
shape (Zod uses OpaqueObject for nested Message[] / WorkspacePane[] etc.,
which are dev-time-narrowed via the existing hand-maintained types).

(5) apps/web/package.json — adds zod ^3.23.8 as a direct dep. Was a
transitive dep via ai-sdk / postgres; promotion makes the import legal.

(6) Tests: 15 new in ws-frames.test.ts covering happy-path per major
frame type, drift-catchers (unknown type, invalid enum, non-UUID, negative
tokens), parts-authoritative read variants, the mirror-file diff check,
and four broker fail-closed scenarios. 219/219 server tests pass (was
204; +15 new).

Two recon corrections to the dispatch brief, both flagged before
implementation:

- No 'parts_appended' frame exists. The brief assumed one; the codebase
  reads parts via the messages_with_parts view after message_complete
  triggers a refetch. MessagePartSchema is therefore unused this batch.
- No 'tool_running' frame exists. The brief listed it as standalone; it
  is in fact a 'chat_status' variant ({ status: 'tool_running' }), already
  covered by ChatStatusFrame.

Smoke: clean container boot, no validation errors in the server log. Real
production frames pass validation (the schemas were derived from the
existing hand-maintained types in api/types.ts and sessionEvents.ts).

v1.13.11-b will follow immediately: convert all ~85 raw broker.publish /
ctx.publish call sites across 11 server files to publishFrame /
publishUserFrame. Mechanical edit; the wiring done here means the diff
in -b is just the call-site swaps.

~310 LoC across 9 files (4 new + 5 modified).
2026-05-22 15:48:32 +00:00
34cbecf975 v1.13.15-tools: tiered tool loading via BOOCODE_TOOLS env var
Pattern lift from eyaltoledano/claude-task-master (MIT + Commons Clause
— pattern only, no code lift). Adds BOOCODE_TOOLS env var with three
tiers:

- core (4 tools): view_file, list_dir, grep, find_files. ~2k token
  schema cost.
- standard (15 tools): core + web_search, web_fetch, git_status, all
  8 codecontext_* tools. ~10k token schema cost.
- all (default; current behavior): every tool in ALL_TOOLS (20). ~21k
  token schema cost.

The env var is a CEILING — narrows agent whitelists, never expands.
Default behavior unchanged when var is unset. resolveToolTier is
case-insensitive and falls back to 'all' on unknown values.

CORE_TOOL_NAMES + STANDARD_TOOL_NAMES validated at module load against
TOOLS_BY_NAME via two top-level for-loops that throw on the first
missing name. Module fails to import if a tier references a tool that
doesn't exist in the registry — catches typos and stale tier
definitions at boot rather than silently filtering valid tools out of
agent whitelists.

Wiring: agents.ts parseAgentBlock now reads BOOCODE_TOOLS from
process.env per parse, intersects with the agent's declared frontmatter
tools (or DEFAULT_TOOLS when frontmatter omits the field). Per-parse
read is fine — agents are re-parsed on the existing 60s cache TTL.

Tests: tools.test.ts grows from 1 to 10 tests. Covers resolveToolTier
across tiers/case/unknown values + the CORE-subset-of-STANDARD invariant
+ TOOLS_BY_NAME existence for both tier sets. 204/204 pass (was 195;
+9 new).

Deviation from the brief: the codecontext tools in the actual registry
have NO codecontext_* prefix (the brief's STANDARD list assumed it).
Used the actual names (get_codebase_overview, search_symbols, etc.).
Module-load validation would have failed boot with the prefixed names.

Smoke: with BOOCODE_TOOLS unset, agents return their full 12-tool
whitelists. With BOOCODE_TOOLS=core in .env + container restart, the
same agents narrow to 4 tools (find_files, grep, list_dir, view_file)
— intersection of declared whitelist ∩ core tier. Reverted after
confirmation.

CLAUDE.md updated with BOOCODE_TOOLS in the Environment section's
Optional list. .env.example gained a commented BOOCODE_TOOLS=all line
with the per-tier token-cost table.

~110 LoC across 5 files (4 modified + 1 test expansion). Under the
brief's ~30 LoC estimate for code; the test suite expansion drove
most of the growth.
2026-05-22 14:59:01 +00:00
9ce638c916 v1.13.10: per-tool token cost accounting (rolling 100-call view)
Surfaces per-tool prompt/completion-token rolling averages in
AgentPicker for at-a-glance agent-cost hints. Implementation is a
SQL view on top of messages_with_parts plus a read endpoint and
AgentPicker tooltip extension. No new write site; all source data
already lands via the existing tool-phase.ts:94-95 / error-handler.ts:
109-110 / sentinel-summaries.ts UPDATEs that v1.13.7's includeUsage:
true fix made non-NULL.

(1) schema.sql — new tool_cost_stats view. Window-functions over
messages_with_parts.tool_calls with LATERAL jsonb_array_elements.
Attribution: equal split — multi-tool turn divides tokens N-ways;
the 100-call rolling mean absorbs split noise. Filters: status=
'complete' + metadata.kind NOT IN ('cap_hit','doom_loop') exclude
failed turns and sentinels respectively; tool_calls IS NOT NULL is
defense-in-depth since sentinels are role='system' rows. CREATE OR
REPLACE means schema apply is idempotent.

(2) routes/tools.ts NEW + index.ts wire-in. GET /api/tools/cost_stats
returns { stats: ToolCostStat[] } with mean_prompt_tokens / mean_
completion_tokens computed at read time (sum / n_calls). Sorted by
tool_name ASC. No pagination — ≤30 tools.

(3) __tests__/tool_cost_stats.test.ts NEW — 7 integration tests
keyed off DATABASE_URL env var. Tests skip gracefully when unset
(no-DB default). beforeAll applies the schema via sql.unsafe(read
FileSync(schema.sql)) for self-contained runs. Helper insertAssistant
Turn shared across cases. Covers: empty state, single-tool attribution,
multi-tool equal split, 100-call FIFO window, NULL-tokens exclusion,
parts-authoritative read via messages_with_parts, failed/sentinel
exclusion.

(4) web/api/types.ts + client.ts — ToolCostStat interface + api.tools.
costStats() method binding.

(5) AgentPicker.tsx — fetch costStats on mount, compute per-agent
sum-of-means across whitelisted tools, render muted cost line below
description: "~5.2k prompt / 280 completion · 6/8 tools · last call
3h ago". Skips line entirely when no tool history; preserves existing
native title= for layout backward-compat. formatK/formatAgo colocated.

Tests: 202/202 pass (195 prior + 7 new view-integration). Server +
web tsc clean.

Smoke: schema applied cleanly; GET /api/tools/cost_stats returns
canonical JSON; view + endpoint agree. Single-row result expected
given the v1.13.1-A → v1.13.7 NULL latent regression window; new
traffic populates organically.

Roadmap row at boocode_roadmap.md:114 plus schema row at :474 both
match. View vs table decision documented in handoff_v1.13.10_per_
tool_cost.md (rollback-safe, microsecond-fast at BooCode scale).

~270 LoC across 8 files (5 modified + 3 new).
2026-05-22 14:42:09 +00:00
b06a4a8e55 v1.13.9: compaction overflow trigger — 0.85 × ctx_max early trigger
Opencode pattern (session/overflow.ts): fire compaction at 85% of
ctx_max, replacing the v1.11.0-era `ctx_max - 20_000` formula.

Old formula: usable = ctx_max - 20_000
  - ctx=262144 → trigger at 242144 (92.4%) — only 7.6% headroom
  - ctx=100000 → trigger at  80000 (80.0%)
  - ctx= 32000 → trigger at  12000 (37.5%) — over-eager
  - ctx<=20000 → trigger at      0 — never fires

New formula: usable = floor(0.85 * ctx_max)
  - ctx=262144 → trigger at 222822 (85.0%) — 15% headroom for summarizer
  - ctx=100000 → trigger at  85000 (85.0%)
  - ctx= 32000 → trigger at  27200 (85.0%)
  - ctx=  8192 → trigger at   6963 (85.0%)

Ratio gives consistent headroom at any context scale. The qwen3.6
daily driver gets ~19k tokens more breathing room before overflow;
small-ctx models no longer degenerate to never-triggering.

usable() is the only consumer of COMPACTION_BUFFER → constant deleted.
New EARLY_TRIGGER_RATIO constant takes its place.

isOverflow() and the maybeFlagForCompaction() call site at
payload.ts:184 are unchanged — formula swap is internal to compaction.ts.
payload.ts comment touched only to drop the stale COMPACTION_BUFFER
reference (PRUNE_TRIGGER_TOKENS stays at 20k as the prune-freed
threshold; independent of the overflow formula).

Tests: 4 new usable() corner cases (262k/100k/8k/zero+negative), plus
5 isOverflow() numbers shifted to match the 85k budget at ctx=100k.
195/195 server tests pass (was 194).

Smoke: ratio math verified by unit tests at all four corners. Live
cap-hit verification deferred — requires accumulating >222k tokens
in a session under qwen3.6-35b-a3b-mxfp4 (was >242k pre-fix); will
surface organically in extended use.
2026-05-22 13:59:14 +00:00
a0c8d212cb v1.13.8: system-prompt prefix stability verify-and-measure
Recon during planning disproved the original v1.13.7 (DB-cache) premise:
buildSystemPrompt already runs over inputs mtime-cached at the file layer
(BOOCHAT.md in system-prompt.ts:25, AGENTS.md global+per-project in
agents.ts:245), and DB scalars are byte-stable until edited. The output
is microsecond pure-string concat with no I/O. Skills aren't in the
prefix; tools live in a separate request body field alpha-sorted by
v1.13.3.

This batch closes the verification gap with instrumentation, not
implementation:

- system-prompt.ts: buildSystemPromptWithFingerprint canonical impl
  computes SHA-256 over the assembled prefix, runs a per-session
  Map<sessionId, lastHash> observer, emits PrefixFingerprint per call
  and PrefixDrift (with field-level changed_inputs) on hash change.
  buildSystemPrompt is now a thin shim returning .prompt.
- agents.ts: getAgentsMtimes accessor — cache-read only, no I/O.
- payload.ts: buildMessagesPayload takes optional log argument; when
  passed, emits prefix-fingerprint (info) + prefix-drift (warn).
- turn.ts + sentinel-summaries.ts: pass ctx.log at 3 production call
  sites; sentinel summaries log too so any drift across cap-hit /
  doom-loop paths surfaces.
- system-prompt.test.ts: 4 new tests (byte-identical, no-drift-on-
  stable, drift-fires-with-changed-inputs, cross-session-no-drift).

194/194 tests pass (was 190).

Smoke: 5 messages in a fresh session produced 7 prefix-fingerprint
logs (extras from buildMessagesPayload being called from sentinel
summary paths), all with identical prefix_hash and prefix_length=2907,
zero prefix-drift. Prefix is byte-stable in steady-state.

Decision: original system_prompt_cache DB table from the roadmap is
permanently dropped. The v1.12.0 mtime caches at the input layer plus
alpha tool ordering at the request body (v1.13.3) already address the
load-bearing cache-stability surfaces. Instrumentation stays so the
claim can be re-verified at any time.
2026-05-22 13:42:18 +00:00
81d837c04e v1.13.6: compaction head-assembly audit + reasoning fix
Audit traced compaction's summary path post-v1.13.1-B read flip:
- Q1: reads from messages_with_parts (view) — clean
- Q2: parts shape correctly threaded through buildHeadPayload — clean
- Q3: reasoning omitted from summary input — FIX NEEDED

v1.13.1-C wired reasoning end-to-end into inference/payload.ts but
missed this read site. Summarizer model couldn't see the reasoning
trail for tool-bearing turns, quietly degrading summary quality for
reasoning-channel models (qwen3.6).

Fix:
- CompactionMessage extended with reasoning_parts field
- SELECT pulls reasoning_parts from messages_with_parts
- buildHeadPayload (now exported for tests) prefixes assistant content
  with <reasoning>...</reasoning>\n\n<content>... when reasoning is
  present; standalone <reasoning>...</reasoning> for tool-call-only
  turns; omits the tag when reasoning is null or empty

4 new render branch tests (190 total).

Smoke deferred: forcing real compaction requires either threshold
pollution or building up a >40k-token chat with reasoning_parts.
Render branches are unit-covered; integration would only re-prove
structural correctness.
2026-05-22 08:18:47 +00:00
f8fc5db929 v1.13.5: opencode truncate.ts port — full tool output retrievable via opaque id
- New services/truncate.ts. Tmpfs storage at /tmp/boocode-truncations/
  (BOOCODE_TRUNCATION_DIR env var overrides for tests). 12-char base32
  opaque ids (~60 bits entropy, "tr_<id>"). Three exports: storeTruncation,
  readTruncation, truncateIfNeeded (wrap-or-passthrough helper).
  cleanupTruncations does TTL-pass (7 days) + orphan-reap (parts query on
  payload->'output'->>'outputPath') in one shot.
- Wired four tools through truncateIfNeeded: view_file (raw full file),
  list_dir (full filtered+secret-filtered entries serialized one-per-line),
  web_fetch (textRaw pre-slice), codecontext_client (body.result pre-slice).
  Each returns the existing sliced view plus an optional outputPath field
  when truncation fires.
- New view_truncated_output ToolDef. Resolves opaque id → on-disk content
  internally; model never sees the truncation dir. Same start_line /
  end_line slicing semantics as view_file. Registered in ALL_TOOLS (alpha
  sort places it after view_file automatically) and READ_ONLY_TOOL_NAMES.
- cleanupTruncations piggybacks on the v1.13.3 stuck-row sweeper's 60s
  setInterval. No-op when truncation dir is empty.

Not wired (TODO follow-up): grep and find_files. file_ops returns post-cap
results to the tool execute path, so the "full content" isn't recoverable
without a refactor of fileOps.grep / fileOps.findFiles to expose the
uncapped result. web_search is silent-slice (no truncated flag); outside
scope. Five sites of seven covered; the remaining two are the only ones
needing a file_ops change.

Tests: 7 new in truncate.test.ts (roundtrip, unknown id, malformed id,
truncateIfNeeded false/true/over-cap/storage-failure paths). 186 total
(was 179). cleanupTruncations file-system half implicitly via TTL pass;
orphan-reap branch covered by the live container smoke.

Smoke verified end-to-end against the live container:
- view_file with start_line=1, end_line=3 on CLAUDE.md → tool_result part
  carried outputPath "tr_cdpn1o04k6ma" + truncated=true.
- /tmp/boocode-truncations/tr_cdpn1o04k6ma exists, 15876 bytes, mode 0o600,
  parent dir mode 0o700.
- Follow-up view_truncated_output(id, start_line=50, end_line=55) returned
  the actual lines 50-55 of CLAUDE.md (the 808notes/BooCode bullets).
- ALL_TOOLS count=20 (was 19); alpha sort places view_truncated_output
  between view_file and watch_changes.

Closes a v1.12 catalog row that was scoped but deferred. The v1.13 parts
table made outputPath ride on the existing tool_result payload with no
schema change beyond the storage helper itself.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 07:55:55 +00:00
ec8593cf77 v1.13.4: two-tier compaction prune — opencode pattern half-shipped in v1.11.0
- message_parts.hidden_at timestamptz column (NULL by default) with a
  partial index on (message_id) WHERE hidden_at IS NULL for the common
  visible-parts filter.
- messages_with_parts view changed from COALESCE(parts, legacy) to
  CASE WHEN EXISTS(any parts of kind) THEN visible-parts ELSE legacy.
  COALESCE would have leaked hidden parts back via the legacy fallback
  when every part was pruned (smoke caught it pre-commit). The CASE
  distinguishes "no parts at all → fall back to legacy column for
  pre-v1.13.0 history" from "all parts hidden → return null/empty so
  the row drops out of the model payload" exactly.
- prune.ts: scans tool_result parts newest-first, protects the last 40k
  tokens (PROTECTED_TOKENS), marks older candidates hidden when their
  combined estimate clears 20k (PRUNE_TRIGGER_TOKENS — equal to
  COMPACTION_BUFFER from v1.11.0, so a successful prune is exactly the
  budget the summary path would have freed). Stops at chats.tail_start_id
  so it doesn't double-erase across the last summary boundary. Pure
  decision helper selectPruneTargets exported separately for unit tests.
- Wired into maybeFlagForCompaction: prune runs synchronously when
  overflow is detected; if it freed >= PRUNE_TRIGGER_TOKENS, the
  needs_compaction flag is NOT set and the (expensive) summary inference
  call is skipped this turn. The next turn's overflow check re-evaluates
  from scratch.
- 6 new unit tests in prune.test.ts cover: empty input, protection-only
  (no candidates), candidates below trigger, candidates above trigger,
  candidates straddling a summary boundary, exactly-protection-tokens.
  179 tests total (was 173).

Smoke verified post-rebuild:
- \\d message_parts shows hidden_at + partial index.
- View definition shows AND p.hidden_at IS NULL filters on all three
  subselects.
- Synthetic hide-then-restore confirmed the view drops the tool_result
  jsonb to null when its only part is hidden, and restores when un-hidden.
- EXPLAIN ANALYZE on the 42-message stress chat: 0.325ms (faster than
  v1.13.1-B's 1.018ms — EXISTS short-circuits cleanly for the common
  no-parts case).
- Normal turn (plain text prompt) completes unaffected.

Closes a v1.11.0 design item that was scoped but never implemented. With
v1.13's parts table the prune is dramatically cheaper to write — pre-parts
it would have meant editing JSON blobs in-place; now it's a hidden_at
flag and a view subselect.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 07:02:17 +00:00
a08d809b73 v1.13.3: cleanup bundle — statement timeout + alpha ordering + stuck-row sweeper + repairToolCall
Four independent items, all owed from prior dispatches.

- statement_timeout at the database level via:
    ALTER DATABASE boocode SET statement_timeout = '30s';
  Applied operationally; documented as a comment at the top of schema.sql
  (ALTER DATABASE can't run inside a DO block, so it's not idempotent
  inside applySchema). Re-apply after a volume reset.

- Tool registry alpha-sorted at module load. llama.cpp's prompt cache
  hits on byte-identical prefixes; any reordering of the tool list near
  the top of the system prompt would invalidate every cached turn.
  Single-source sort at the ALL_TOOLS export so toolJsonSchemas() and
  TOOLS_BY_NAME inherit the order automatically. New tools.test.ts
  asserts the invariant; total tests 173 (was 172).

- Periodic in-process stuck-row sweeper. Runs every 60s, marks
  'streaming' rows older than 5 minutes as 'failed', and publishes
  chat_status='idle' on the user channel so the UI dot drops without a
  refresh. Closes the mid-session crash UX gap; the v1.12.1 boot sweep
  only fires once at startup, so sessions used to stay stuck until next
  reboot. setInterval cleaned up via app.addHook('onClose'). Mirrors
  handleAbortOrError's publish pattern.

- experimental_repairToolCall wired through AI SDK v6 streamText. Pass-
  through implementation: log + return the original toolCall so the
  stream keeps going. executeToolPhase's existing error paths (unknown
  tool name → 'unknown tool: X' result; zod-reject → 'tool X rejected
  — field: required') already surface bad calls to the model; the value
  here is preventing the AI SDK from THROWING on parse errors and
  killing the whole stream. Owed since v1.13.1-A.

Smoke verified:
- statement_timeout = '30s' confirmed via SHOW.
- Tool path normal flow intact (list_dir prompt → tool_call → result
  → final assistant). No malformed tool calls in the test run; repair
  log will surface them when qwen3.6 actually emits one.
- Alpha order verified at runtime via the dist bundle: match: true.
- Sweeper logic not traffic-tested (no stuck rows to find), but the
  SQL UPDATE + broker.publishUser pattern is identical to handleAbort
  and the boot sweep — synthesis-only verification.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 06:46:03 +00:00
ac1a71f583 v1.13.1-C: port ask_user_input correlation to parts + wire reasoning_parts end-to-end
Pass 1 — ask_user_input correlation port (messages.ts:478, :549):

- The two correlation queries that backed the elicitation flow used to scan
  messages.tool_calls and messages.tool_results JSON columns directly. They
  now JOIN message_parts on payload->>'id' (for the caller assistant) and
  payload->>'tool_call_id' (for the pending tool row). Semantics preserved:
  ORDER BY m.created_at DESC LIMIT 1 still picks the latest issuance, the
  already-answered 409 guard now reads payload.output, and the UPDATE +
  parts replace inside sql.begin is unchanged from v1.13.0.
- Pre-v1.13.0 history has no parts rows and is unreachable to this lookup
  path (404). Acceptable per dispatch decision — no pending elicitation
  from before v1.13.0 will still be open. JSON-column fallback can land as
  a hotfix if it ever surfaces.

Pass 2 — reasoning_parts wired end-to-end:

- types.ts/StreamResult gains `reasoning: string`. stream-phase.ts accumulates
  reasoning-delta text per stream (replacing the v1.13.1-A counter-only
  diagnostic) and returns it on the result.
- parts.ts/partsFromAssistantMessage gains an optional `reasoning` param.
  When present it emits a kind='reasoning' part at sequence 0, ahead of
  the text and tool_call parts.
- error-handler.ts/finalizeCompletion and tool-phase.ts/executeToolPhase
  both thread result.reasoning into the dual-write call so reasoning-channel
  models (qwen3.6) get persistent reasoning rows.
- payload.ts: loadContext SELECT pulls reasoning_parts from the v1.13.1-B
  view; OpenAiMessage gains an optional `reasoning` field; buildMessagesPayload
  collapses reasoning_parts into a single string per assistant message.
- stream-phase.ts/toModelMessages converts assistant messages with reasoning
  into an AI SDK ModelMessage content array starting with a ReasoningPart,
  matching the @ai-sdk/provider-utils AssistantContent union. Reasoning
  models can now replay prior reasoning context across tool-call boundaries.
- types/api.ts and apps/web/src/api/types.ts Message interface gain
  reasoning_parts (optional, nullable). Frontend doesn't render this yet —
  field reserved for a v1.14 UI surface.

Tests: 2 new in parts.test.ts cover reasoning-at-sequence-0 with and
without text content. 172 tests pass (170 prior + 2 new).

Smoke verified against the live container:
- A reasoning-prompt ("walk through 17 × 23 step by step") produced one
  message with kind='reasoning' (361 chars) at sequence 0 and kind='text'
  (429 chars) at sequence 1. Adapter log confirmed reasoning capture.
- The new correlation SQL was validated against existing tool_call /
  tool_result parts: returns the expected message_id + payload shape with
  pending state correctly identified via payload.output IS NULL.
- ask_user_input end-to-end through the UI is Sam's smoke — the Prompt
  Builder agent does not always trigger ask_user_input for these prompts,
  so synthetic verification via SQL substituted for traffic-driven cover.

Annotation: the v1.13.1-A abort-throw site in stream-phase.ts got a
one-liner comment ("AI SDK v6 fullStream returns normally on abort; check
signal explicitly.") to prevent a future refactor removing it.

v1.13.2 drops the dual-write + the JSON columns + collapses the view.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 06:34:10 +00:00
1cb6eee24c v1.13.0: message_parts table + dual-write at every tool_calls/tool_results site
Adds a granular message_parts table (one row per text/tool_call/tool_result
chunk) without changing any read path. Old messages.content / tool_calls /
tool_results columns remain authoritative for v1.13.0; this dispatch is
write-only mirroring so the AI SDK migration in v1.13.1 can flip read
authority without a backfill window.

Schema:
  CREATE TABLE message_parts (id, message_id FK ON DELETE CASCADE,
    sequence int, kind text CHECK (text|tool_call|tool_result|reasoning|step_start),
    payload jsonb, created_at, UNIQUE (message_id, sequence))

New module services/inference/parts.ts with two pure derive helpers
(partsFromAssistantMessage, partsFromToolMessage) and insertParts that
fan-outs a multi-row INSERT via postgres-js.

Wired dual-write at every site that writes tool_calls or tool_results:
- tool-phase.ts: assistant finalize UPDATE, executed-tool UPDATE,
  ask_user_input sentinel UPDATE
- messages.ts answer flow: DELETE pending tool_result part + INSERT
  answered one inside the existing sql.begin
- skills.ts: synthetic assistant + tool INSERTs both inside existing tx
- chats.ts fork: CTE clones parts via ROW_NUMBER pairing (source→dest
  message id mapping in one statement, no N+1)
- error-handler.ts finalizeCompletion: text part for plain text-only
  assistant turns

Deviation: tool-phase.ts finalize UPDATEs and finalizeCompletion text-part
write are not wrapped in fresh sql.begin transactions. Safe in v1.13.0
because JSON columns are authoritative for reads. v1.13.1 must wrap these
sites before flipping read authority — TODO comments added at each
unwrapped site referencing v1.13.1.

Tests: 8 new unit tests for the derive helpers in
services/__tests__/parts.test.ts. Existing 162 tests untouched. 170 total.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 05:46:29 +00:00
9ef00c0268 v1.12.4: complete inference.ts split into services/inference/
- sentinel-summaries.ts: runCapHitSummary, insertCapHitSentinel,
  runDoomLoopSummary, insertDoomLoopSentinel
- inference.ts → inference/turn.ts: residue is runAssistantTurn,
  runInference, createInferenceRunner orchestration only
- inference/index.ts: re-export shim preserves the public surface
  (createInferenceRunner, runInference, runAssistantTurn,
  detectDoomLoop, DOOM_LOOP_THRESHOLD, buildMessagesPayload, plus
  type-side InferenceContext/InferenceFrame/StreamResult/TurnArgs/
  FramePublisher)
- src/index.ts + auto_name.ts + the two vitest test files updated to
  import from ./services/inference/index.js explicitly (NodeNext ESM
  doesn't honor directory-index resolution)

Final tally: 11 files under services/inference/, the largest being
sentinel-summaries.ts at 523 LoC (two near-clone summary paths kept
side-by-side until a third sentinel justifies factoring out a shared
runWrapUpSummary). turn.ts is now 326 LoC, the next-largest is
stream-phase.ts at 380. Public import surface unchanged.

tool-phase.ts → turn.ts back-edge for runAssistantTurn remains
(cycle is safe; resolved at call time).

Prepares the file structure for v1.13 AI SDK migration — streamText
swap targets stream-phase.ts only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 22:36:35 +00:00
fce8c06932 Merge v1.11.10 + doc refinements onto v1.12.0 main
# Conflicts:
#	CLAUDE.md
2026-05-21 15:22:46 +00:00
16c69a38a1 Merge v1.12 track B: codecontext sidecar
# Conflicts:
#	apps/web/src/components/ToolCallLine.tsx
#	docker-compose.yml
2026-05-21 15:12:30 +00:00
a2e2481ef9 v1.12 track A: container guidance + skills 2026-05-21 15:11:04 +00:00
136e9538aa v1.12 track B.2: codecontext tool wrappers + tests 2026-05-21 13:35:44 +00:00
3e1e17ecf6 v1.11.10: stream-cap response body at 5MB, abort on overflow 2026-05-21 02:27:31 +00:00
ab01e04d77 v1.11.9: manual redirect handling — re-run URL guard on each hop 2026-05-21 00:37:35 +00:00
4e67a265ac v1.11.8: address review — inject fetcher, byte-count limit, redirect TODO 2026-05-20 21:40:11 +00:00
2fdbb05477 v1.11.8: web_search + web_fetch tools via SearXNG
Adds two new tools registered through the existing ALL_TOOLS registry:
  - web_search hits SearXNG's JSON API (Fathom, internal Tailscale URL,
    no auth) and returns top results
  - web_fetch retrieves a URL's text content, gated by isPublicUrl
    (url_guard.ts) which blocks loopback / RFC1918 / Tailscale CGNAT /
    link-local / .local / .internal / non-http schemes

Both tools are opt-in via the existing session.web_search_enabled flag
(plumbed in v1.9, activated here). Default off. UI labels updated to
"Enable web search and fetch" / "Web search and fetch" since fetch joins
the same store. Counts against the v1.8.2 per-turn budget; covered by
the v1.11.6 doom-loop guard.

Native Node 20 fetch — no new prod dep. HTML stripping via regex (script
and style content elided wholesale). 5MB body cap, 15s fetch timeout,
8000-char default output, 32000-char cap.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 21:38:02 +00:00
863452ae07 v1.11.7: secret-file deny list for codebase tools
Ports continue.dev's DEFAULT_SECURITY_IGNORE_FILETYPES + ignored-dir lists
into apps/server/src/services/secret_guard.ts plus a small BooCode
additions block (id_rsa*, *credentials*, .netrc, *.kdbx). Tiny glob-to-
regex matcher; no new prod dep.

view_file hard-refuses via SecretBlockedError. list_dir / grep /
find_files filter their results and surface a pathguard_note string
field with the hidden count — never list the offending paths back.

Named secret_guard.ts (not safety/pathGuard.ts) to avoid collision with
the existing path_guard.ts which already exports a pathGuard() function.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 20:55:50 +00:00
f92b0810c3 v1.11.6: doom-loop guard (3 identical tool calls aborts recursion) 2026-05-20 20:28:45 +00:00
89dcfb95dc v1.11.3: fix ctx_max capture via /props endpoint
- llama-server does not emit n_ctx in timings (confirmed empirically);
  dead code at inference.ts:479 and compaction.ts:300 never fired
- New model-context.ts: cached fetch of /upstream/<model>/props
  with positive-cache (no TTL) and 60s negative-cache
- Wired into all 4 ctx_max write sites: 3 in inference.ts
  (executeToolPhase, finalizeCompletion, runCapHitSummary) and
  1 in compaction.ts (summary row INSERT)
- AbortController 3s timeout, lenient parsing with sensible defaults
- 12 new vitest cases for the cache module (59 total)
- 7 historical assistant rows backfilled manually (see notes)
2026-05-20 19:29:26 +00:00
dc43dd44f9 v1.11: opencode-style compaction port
- compaction.ts: usable/isOverflow/estimate/turns/select/buildPrompt/process
- compaction-prompt.ts: SUMMARY_TEMPLATE verbatim from opencode
- schema: messages.{compacted_at,summary,tail_start_id} + chats.needs_compaction
- inference: auto-trigger on overflow, pre-fetch compaction before next turn
- /compact slash command rewired to new path
- WS: chat_status working/idle around compaction + compacted frame
- frontend: SummaryCard + sonner toast on compacted
- 24 unit tests for pure functions
2026-05-20 19:05:35 +00:00
09aecc4ee9 v1.9: settings pane + per-project defaults + bulk archive + themes lift
Adds a singleton, ephemeral 'settings' pane kind to the workspace.
Opened via a new bottom-pinned button in ProjectSidebar (emits an
open_settings_pane event when a session is mounted; navigates to
/settings otherwise). Pane has three sections — Session, Project,
Theme — and a maximize toggle that hides sibling pane columns via
display:none on desktop only. Settings panes don't count toward
MAX_PANES and are filtered out of the localStorage persistence layer
so reload always restores a clean workspace.

Schema (additive):
- projects.default_system_prompt TEXT NOT NULL DEFAULT ''
- projects.default_web_search_enabled BOOLEAN NOT NULL DEFAULT false
- sessions.web_search_enabled BOOLEAN  (nullable; null = inherit)

Inference resolves user_prompt = session.system_prompt.trim() ||
project.default_system_prompt.trim() — empty/whitespace at either
layer means "no override". Keeps the columns NOT NULL and matches
the existing inherit semantics.

Server routes:
- GET /api/projects/:id (new; settings pane refetches on
  project_updated)
- PATCH /api/projects/:id accepts default_system_prompt,
  default_web_search_enabled
- PATCH /api/sessions/:id accepts web_search_enabled (tri-state)
- POST /api/projects/:id/sessions/archive-all + GET
  /api/projects/:id/sessions/open-count
- POST /api/sessions/:id/chats/archive-all + GET
  /api/sessions/:id/chats/open-count
- PATCH /api/sessions/:id now broadcasts session_updated on every
  successful PATCH (was rename-only). Lets SettingsPane open in
  another tab pick up edits without a refetch.

Bulk-archive publishes one session_archived / chat_archived frame
per affected id so useSidebar's existing reducer cases handle them
incrementally — no new frame type, no payload widening.

ModelPicker refactored: shared ModelList inside a responsive shell.
Desktop = labeled trigger + DropdownMenu, mobile = icon-only Cpu
button + BottomSheet. Header in Session.tsx drops the pill wrap on
mobile since the new trigger is the visual.

ChatInput gains an icon-only '+' DropdownMenu next to AgentPicker
when sessionId + webSearchEnabled props are provided. One item for
now — Web search — with a checkmark reflecting the stored value
(true), not the effective one. Click PATCHes the override; to
restore inherit-from-project the user opens SettingsPane.

ThemePicker lifted out of pages/Settings.tsx into a reusable
component. The standalone /settings route is now a thin wrapper
that mounts <ThemePicker /> with a Back button on top
(navigate(-1) with fallback to '/'); the SettingsPane Theme tab
renders the same picker bare.

Project section delete-flow removed (button + confirm dialog +
handler). Replaced with "Archive all sessions" using the same
two-step count → confirm → fire pattern as "Archive all chats" in
the Session section. api.projects.remove() stays in the client
because useProjects.ts still uses it.

Hand-rolled Switch primitive in SettingsPane (no shadcn switch in
the project; spec said no new deps). Section nav is plain buttons
(no shadcn Tabs).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 17:37:29 +00:00
5c61cc7281 v1.8.2: tool loop cap-hit summary + tool call UI compaction
Old hardcoded MAX_TOOL_LOOP_DEPTH=15 replaced by per-agent
max_tool_calls (1-100, AGENTS.md frontmatter) with defaults: 30 for
read-only-only agents, 10 for agents that include any non-read-only
tool, 15 for raw chat. When the loop hits cap, fire one final summary
call with tools disabled, stream the wrap-up into the in-flight
assistant message, then insert a system sentinel with
metadata.kind='cap_hit'. The sentinel renders an amber bubble with a
Continue button (latest sentinel only) that POSTs to a new
/api/chats/:id/continue route to extend. Hard ceiling: 3 cap-hits per
chat (2 continues max) — third sentinel reports can_continue=false.

Error frames carry a machine-readable reason code alongside human
error text. Failed messages persist the reason via
metadata.kind='error' so the bubble renders specifics on reload (WS
error frame is one-shot).

Tool call UI rewired: ToolCallLine renders inline (↳ name args
spinner/check/✗, expand-on-tap for args+result); ToolCallGroup
collapses 3+ consecutive same-tool runs into a compact card.
MessageList owns a three-pass pre-render (flatten + fold tool
results onto matching runs by id + group same-tool runs + number
sentinels). MessageBubble drops tool rendering and adds the
sentinel / error-reason branches. ToolCallCard deleted.

Roadmap follow-up logged: add explicit max_tool_calls: 30 to the 6
agents in /data/AGENTS.md and /opt/boocode/AGENTS.md post-ship for
discoverability (defaults handle behavior identically).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 10:31:32 +00:00
1ecb79476e test: vitest harness + unit tests for security-critical pure functions
Adds vitest 3.x (pinned to ^3 because vitest 4 requires Vite 6, while the
web app pins Vite 5). Tests live under src/**/__tests__/**.

Three target functions:
- sanitizeFolderName (project_bootstrap.ts): 8 cases covering happy path,
  path-traversal stripping, empty-after-sanitize, control chars, truncation
  at 64, null bytes, leading/trailing dot/slash stripping.
- resolveProjectPath (projects.ts): 7 cases including symlink-escape via
  realpath, outside-whitelist rejection, nonexistent path, AND a flagged
  BEHAVIOR GAP: passing the whitelist path itself currently returns success
  rather than erroring out (function early-exits the scope check when
  real === whitelistReal). Test asserts current behavior with explicit
  comment flagging the spec violation — function NOT silently patched.
  Function made exportable for testing (single keyword change).
- buildMessagesPayload (inference.ts): 8 cases for compact-marker logic
  (no marker, marker present, multiple compacts, tool-message position).

tsconfig.json excludes __tests__ + *.test.ts from emit so dist/ stays clean.

pnpm -C apps/server test => 23 passed in ~340ms.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 04:35:31 +00:00