Converts the ad-hoc executeToolPhase → runAssistantTurn recursion into an
explicit while (stepNumber < effectiveCap) loop. A step is one stream-and-
tool-execute iteration; the loop terminates on non-tool finish, step-cap hit,
doom-loop, budget exhaustion, abort, or synthesis success.
MAX_STEPS = 200 hard ceiling (4x old effective limit from budget). Per-agent
steps: field in AGENTS.md frontmatter sets tighter caps (Refactorer: 5,
Architect: 20, others: unset = bounded only by MAX_STEPS). Resolution:
effectiveCap = Math.min(agent.steps ?? Infinity, MAX_STEPS).
executeToolPhase no longer recurses — returns ToolPhaseResult struct
(action: 'continue' | 'paused' | 'synthesis_done') so the caller decides
whether to continue or break. steps: 0 handled as "no tool calls allowed"
via runTextOnlyTurn (one text-only stream phase, tool calls ignored with
warn log).
Step-cap hits produce a sentinel summary (reuses cap_hit kind so
CapHitSentinel.tsx renders without frontend changes; text distinguishes
"Step limit reached" from "Tool budget exhausted"). Doom-loop check migrated
to top of loop body — same predicate, same threshold (3), break instead of
return.
step_start parts are in the schema CHECK but not emitted as message_parts —
writing before the stream phase creates a sequence-0 collision with
partsFromAssistantMessage. Structured log line emitted instead. Adversarial
review caught the collision pre-deploy.
332/332 server tests passing. No frontend changes. No schema changes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pattern lift from eyaltoledano/claude-task-master (MIT + Commons Clause
— pattern only, no code lift). Adds BOOCODE_TOOLS env var with three
tiers:
- core (4 tools): view_file, list_dir, grep, find_files. ~2k token
schema cost.
- standard (15 tools): core + web_search, web_fetch, git_status, all
8 codecontext_* tools. ~10k token schema cost.
- all (default; current behavior): every tool in ALL_TOOLS (20). ~21k
token schema cost.
The env var is a CEILING — narrows agent whitelists, never expands.
Default behavior unchanged when var is unset. resolveToolTier is
case-insensitive and falls back to 'all' on unknown values.
CORE_TOOL_NAMES + STANDARD_TOOL_NAMES validated at module load against
TOOLS_BY_NAME via two top-level for-loops that throw on the first
missing name. Module fails to import if a tier references a tool that
doesn't exist in the registry — catches typos and stale tier
definitions at boot rather than silently filtering valid tools out of
agent whitelists.
Wiring: agents.ts parseAgentBlock now reads BOOCODE_TOOLS from
process.env per parse, intersects with the agent's declared frontmatter
tools (or DEFAULT_TOOLS when frontmatter omits the field). Per-parse
read is fine — agents are re-parsed on the existing 60s cache TTL.
Tests: tools.test.ts grows from 1 to 10 tests. Covers resolveToolTier
across tiers/case/unknown values + the CORE-subset-of-STANDARD invariant
+ TOOLS_BY_NAME existence for both tier sets. 204/204 pass (was 195;
+9 new).
Deviation from the brief: the codecontext tools in the actual registry
have NO codecontext_* prefix (the brief's STANDARD list assumed it).
Used the actual names (get_codebase_overview, search_symbols, etc.).
Module-load validation would have failed boot with the prefixed names.
Smoke: with BOOCODE_TOOLS unset, agents return their full 12-tool
whitelists. With BOOCODE_TOOLS=core in .env + container restart, the
same agents narrow to 4 tools (find_files, grep, list_dir, view_file)
— intersection of declared whitelist ∩ core tier. Reverted after
confirmation.
CLAUDE.md updated with BOOCODE_TOOLS in the Environment section's
Optional list. .env.example gained a commented BOOCODE_TOOLS=all line
with the per-tier token-cost table.
~110 LoC across 5 files (4 modified + 1 test expansion). Under the
brief's ~30 LoC estimate for code; the test suite expansion drove
most of the growth.
Recon during planning disproved the original v1.13.7 (DB-cache) premise:
buildSystemPrompt already runs over inputs mtime-cached at the file layer
(BOOCHAT.md in system-prompt.ts:25, AGENTS.md global+per-project in
agents.ts:245), and DB scalars are byte-stable until edited. The output
is microsecond pure-string concat with no I/O. Skills aren't in the
prefix; tools live in a separate request body field alpha-sorted by
v1.13.3.
This batch closes the verification gap with instrumentation, not
implementation:
- system-prompt.ts: buildSystemPromptWithFingerprint canonical impl
computes SHA-256 over the assembled prefix, runs a per-session
Map<sessionId, lastHash> observer, emits PrefixFingerprint per call
and PrefixDrift (with field-level changed_inputs) on hash change.
buildSystemPrompt is now a thin shim returning .prompt.
- agents.ts: getAgentsMtimes accessor — cache-read only, no I/O.
- payload.ts: buildMessagesPayload takes optional log argument; when
passed, emits prefix-fingerprint (info) + prefix-drift (warn).
- turn.ts + sentinel-summaries.ts: pass ctx.log at 3 production call
sites; sentinel summaries log too so any drift across cap-hit /
doom-loop paths surfaces.
- system-prompt.test.ts: 4 new tests (byte-identical, no-drift-on-
stable, drift-fires-with-changed-inputs, cross-session-no-drift).
194/194 tests pass (was 190).
Smoke: 5 messages in a fresh session produced 7 prefix-fingerprint
logs (extras from buildMessagesPayload being called from sentinel
summary paths), all with identical prefix_hash and prefix_length=2907,
zero prefix-drift. Prefix is byte-stable in steady-state.
Decision: original system_prompt_cache DB table from the roadmap is
permanently dropped. The v1.12.0 mtime caches at the input layer plus
alpha tool ordering at the request body (v1.13.3) already address the
load-bearing cache-stability surfaces. Instrumentation stays so the
claim can be re-verified at any time.
Removed /opt/boocode/AGENTS.md (per-project override) — the project's
agents now resolve from the global /data/AGENTS.md only. Eliminates the
two-files-must-stay-in-sync footgun that surfaced during B.3
verification.
Fix: agents.ts ALL_TOOL_NAMES was a hardcoded 9-item whitelist that
silently filtered any unknown tool name from agent.tools arrays. This
caused web_search/web_fetch (v1.11.8) and the 8 codecontext tools to be
dropped at parse time. Replaced with ALL_TOOLS.map(t => t.name) for
single source of truth. Pre-existing exposure was dormant since no
builtin agent listed web_search; surfaced by adding codecontext.
- /data/skills mount (host: /opt/skills)
- skill_find, skill_use, skill_resource added to default read-only
tool set; opt-in for agents with explicit tools: whitelist
- AGENTS.md builtin agents drop explicit tools: arrays to inherit
the new default (now includes skill tools)
- POST /api/chats/:id/skill_invoke for slash-command flow
- 19 SKILL.md files seeded at /opt/skills/ across 6 source groups
Old hardcoded MAX_TOOL_LOOP_DEPTH=15 replaced by per-agent
max_tool_calls (1-100, AGENTS.md frontmatter) with defaults: 30 for
read-only-only agents, 10 for agents that include any non-read-only
tool, 15 for raw chat. When the loop hits cap, fire one final summary
call with tools disabled, stream the wrap-up into the in-flight
assistant message, then insert a system sentinel with
metadata.kind='cap_hit'. The sentinel renders an amber bubble with a
Continue button (latest sentinel only) that POSTs to a new
/api/chats/:id/continue route to extend. Hard ceiling: 3 cap-hits per
chat (2 continues max) — third sentinel reports can_continue=false.
Error frames carry a machine-readable reason code alongside human
error text. Failed messages persist the reason via
metadata.kind='error' so the bubble renders specifics on reload (WS
error frame is one-shot).
Tool call UI rewired: ToolCallLine renders inline (↳ name args
spinner/check/✗, expand-on-tap for args+result); ToolCallGroup
collapses 3+ consecutive same-tool runs into a compact card.
MessageList owns a three-pass pre-render (flatten + fold tool
results onto matching runs by id + group same-tool runs + number
sentinels). MessageBubble drops tool rendering and adds the
sentinel / error-reason branches. ToolCallCard deleted.
Roadmap follow-up logged: add explicit max_tool_calls: 30 to the 6
agents in /data/AGENTS.md and /opt/boocode/AGENTS.md post-ship for
discoverability (defaults handle behavior identically).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Builtins move out of code into /data/AGENTS.md (always-on, mounted ro
into the container); per-project AGENTS.md is now an optional override.
agents.ts merges global + project entries with project-wins-by-name and
caches per-source mtimes (60s TTL). Parser switches to per-block
try/catch and returns AgentsResponse { agents, errors[] } so one
malformed block no longer fails the file. AgentPicker shows a
non-blocking amber chip listing skipped blocks and only fires a gray
toast when zero agents loaded.
WS reconnect UX (useUserEvents + useSessionStream) now silent on the
first disconnect; createWsReconnectToast escalates to gray after 3
failures or 15 s, then to red with a Retry Now action after 60 s.
useSessionStream also gained the exponential-backoff reconnect it was
missing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Six builtin defaults (Code Reviewer, Debugger, Refactorer, Architect,
Security Auditor, Prompt Builder) with no model field so session.model
wins. Project root AGENTS.md parsed on demand with mtime cache; when
present, only its agents are shown. sessions.agent_id resolves per turn
into effective system prompt, temperature, and a tool whitelist applied
in inference. AgentPicker mounts in the ChatInput toolbar; SettingsDrawer
agent surface deferred to Batch 7.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>