Checkpoint of in-progress backend work present in the tree, not authored this session: auto_name, inference tool-phase/turn, secret_guard, provider-registry, plus a new agent-allowlist test (7 tests, passing).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Batch 3c: when an agent has llama_extra_args in AGENTS.md, provider.ts
routes inference through LLAMA_SIDECAR_URL instead of LLAMA_SWAP_URL.
X-Agent-Flags header built from the agent's flags. Boot-time guard
refuses to start if any agent has llama_extra_args but LLAMA_SIDECAR_URL
is unset. PrefixFingerprint gains a route field (swap/sidecar) for
per-turn visibility. 9 provider tests.
AGENTS.md tool gap: all agents (except Prompt Builder) were missing 8
tools that were added after the original tool lists were written:
request_read_access, view_truncated_output, ask_user_input, git_status,
get_blast_radius, get_hot_files, get_middleware, get_routes. The missing
request_read_access caused silent "permission denied" when reading files
outside the project root.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reject qwen3.6 spurious <invoke> tails with path "..." or empty args before
they enter toolCalls, preventing duplicate assistant answers. Dropped blocks
append to flushed text; four new xml-parser tests. DEFERRED-WORK §6 for
console.debug → pino cleanup.
Co-authored-by: Cursor <cursoragent@cursor.com>
Generalizes the v1.14.1 single-server Context7 PoC into a multi-server MCP
client registry with per-server graceful degradation. JSON config at
/data/mcp.json (bind-mounted alongside AGENTS.md) matches opencode's
mcpServers schema shape. Config file missing = no MCP (opt-in by presence).
Two transports: Streamable HTTP (remote servers like Context7) and stdio
(local subprocess servers like codecontext). Stdio spawns a persistent child
via the SDK's StdioClientTransport; shutdown hook closes all transports.
Tool prefix generalized from context7_<name> to <serverName>_<toolName> with
a toolToServer reverse map for dispatch routing. AGENTS.md tools: field now
supports glob patterns (context7_*, !web_*) via matchToolGlob — last-match-
wins with ! deny prefix. Replaces exact-match .includes() in stream-phase.ts.
refreshToolNames() in agents.ts rebuilds the DEFAULT_TOOLS snapshot after
appendMcpTools so agents without explicit tools: lists see MCP tools —
reviewer caught that the module-load-time snapshot would permanently exclude
late-registered tools.
Read-only invariant: readOnlyHint === false rejected at discovery. Result
size capped at 5MB. v1.14.1 env vars removed — superseded by config file.
Default data/mcp.json ships with Context7 disabled.
363/363 server tests passing. No schema changes, no frontend changes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Validates the MCP-client loop end-to-end against one real MCP server before
the full v1.15 port. New services/mcp-client.ts wraps @modelcontextprotocol/sdk
v1.29.0 with Streamable HTTP transport. On startup (when MCP_CONTEXT7_URL is
set), connects to Context7, discovers tools via tools/list, wraps each as a
ToolDef prefixed context7_<name>, and appends to ALL_TOOLS via appendMcpTools.
Read-only invariant guard rejects any tool with readOnlyHint: false. Tool
dispatch is transparent — executeToolCall routes MCP calls through the ToolDef
execute wrapper, which strips the prefix before calling the MCP server. Result
size capped at 5MB with truncation. Graceful degradation: server down at
startup → zero tools; server down mid-session → error result, model
self-corrects.
Adversarial review caught that a Zod .default() on the URL config made MCP
always-on instead of opt-in — fixed by removing the default. MCP_CONTEXT7_URL
must be explicitly set to enable.
ALL_TOOLS changed from ReadonlyArray to mutable to support late-registration.
appendMcpTools re-sorts and rebuilds TOOLS_BY_NAME after append.
348/348 server tests passing (16 new mcp-client tests). No schema changes,
no frontend changes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Removes the dual-write into messages.tool_calls / messages.tool_results JSON
columns and drops the columns. message_parts is now the only source of truth
for tool calls and tool results.
10 dual-write sites stripped (5 in tool-phase.ts, 2 in routes/skills.ts, 2 in
routes/messages.ts, 1 in routes/chats.ts fork-clone). The recon-driven grep
caught 2 sites beyond the original v1.13.2 roadmap inventory and an extra
fixture file (tool_cost_stats.test.ts) with a direct legacy-column INSERT.
messages_with_parts view rewritten to parts-only subselects (COALESCE
fallbacks gone). View runs via CREATE OR REPLACE so it lands before the
column DROPs in startup DDL — Postgres rejects column-drop on view-referenced
cols. v1.12.1 cleanup DO block (DROP CONSTRAINT messages_status_check /
messages_role_check) removed; those one-shots have done their work.
Adversarial review caught a runtime bug the green test suite missed: the
discard_stale endpoint (chats.ts) had a RETURNING ... tool_calls, tool_results
clause that would have crashed on every 60s-no-token-activity recovery in
production. Fixed by switching to two-step UPDATE returning id, then SELECT
from messages_with_parts so parts-synthesized fields keep flowing on the wire.
Message API type retains tool_calls? / tool_results? — the view synthesizes
those keys from parts so the wire shape is unchanged; frontend reads need no
update. Override on the original v1.13.2 plan, captured in the openspec
proposal.
339/339 server tests passing (including 7 DB-integration tests that applied
the schema migration to a live DB and ran the parts-only view end-to-end).
tsc + web build clean.
Pairs with v1.13.0-ai-sdk-v6 (introduced the dual-write) and v1.13.1-B (moved
the read path to messages_with_parts). Umbrella v1.13 tag ships on this same
commit, marking the strangler-fig closed.
CLAUDE.md picks up Sam's pre-existing edits documenting tag-naming and
CHANGELOG conventions — both already in use by v1.13.19 / v1.13.20.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Every assistant message gets an "Open in pane" affordance that opens the
message in the workspace splitter — Markdown pane (Copy + Download .md) by
default; HTML pane (Download .html only) when the model emits a self-contained
<!DOCTYPE html> or fenced ```html artifact. BOOCHAT.md rule keeps Markdown
default at every length; HTML opt-in on explicit user request.
Backend: services/artifacts.ts (slug derivation + write helpers with
symlink-escape guard via realpath-after-mkdir), routes/artifacts.ts (POST
download + GET stream with nosniff + CSP sandbox defense-in-depth), HTML
detection in finalizeCompletion writing a new message_parts.kind='html_artifact'
row (schema CHECK extended via v1.13.13 pattern), graceful 1MB cap via the
pure decideHtmlArtifactWrite helper. PartKind union extended.
Frontend: MarkdownRenderer.tsx extracted from MessageBubble's inline
MarkdownBody for reuse; MarkdownArtifactPane.tsx + HtmlArtifactPane.tsx with
loading/error states; pane state is reference-only ({chat_id, message_id,
title}) — content fetched on mount to keep workspace_panes jsonb small and
avoid 1MB blobs riding session_workspace_updated frames. iframe sandbox
locked to allow-scripts allow-clipboard-write allow-downloads with no
allow-same-origin, srcDoc not src. openInPane discriminates 404 (expected
fallback) from real errors (toast + bail). PanelRightOpen icon button with
mobile 44px tap-target.
31 new server unit tests including a real-symlink filesystem case; 332/332
server tests passing, tsc clean both sides, pnpm -C apps/web build green.
Smoke deferred to first deploy.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Four codecontext sidecar wrappers — get_file_analysis (required
file_path), get_symbol_info, get_dependencies, and get_semantic_neighborhoods
(optional) — forwarded file_path to the HTTP sidecar unchanged. The
sidecar's internal file index is keyed on absolute paths, so any
relative path from the model returned "File not found in graph".
Three back-to-back failures observed in one chat on 2026-05-22
17:56 UTC, ~48 s of wasted tool budget.
## Resolver
Add resolveProjectPath(projectRoot, rawPath) in codecontext_client.ts:
trim check → absolute/relative branch (both go through resolve() so
dot-segments normalise) → realpath with ENOENT fallthrough → escape
check using the realpathed value. Error shape mirrors the existing
target_dir escape error byte-for-byte; only the field name differs.
Wired into callCodecontext at the args-spread site, guarded on
file_path presence + non-empty. All four wrappers benefit from one
call site; wrappers without file_path (overview, framework, watch,
search) are unaffected.
## Schema trim
.trim() added to all four file_path Zod schemas:
get_file_analysis: z.string().trim().min(1)
get_symbol_info: z.string().trim().optional()
get_dependencies: z.string().trim().optional()
get_semantic_neighborhoods: z.string().trim().optional()
Absorbs trailing newlines / whitespace from model output before the
resolver sees the value.
## Adversarial review fixes
Adversarial pass surfaced two P2 findings:
1. Absolute path with `..` resolving outside the project root (e.g.
`<projectRoot>/../etc/passwd`) that ENOENTs at realpath would slip
through the literal prefix-check: the raw string starts with
`<projectRoot>/`. Fix: resolve() the absolute branch's candidate
too, so dot-segments normalise before the prefix check.
2. No symlink-escape test coverage. Realpath's stated purpose
(catching in-project symlinks pointing outside the project) was
never tested. Added: create a tmpdir outside projectRoot,
symlink projectRoot/evil-link → outside file, assert rejection.
## Tests
codecontext_client.test.ts: 19 tests (10 baseline + 9 new file_path
resolution cases). Cases cover: relative→absolute, absolute-inside,
relative-escape, absolute-outside, ENOENT-fallthrough, empty-string,
wrapper-without-file_path, absolute-with-`..`-ENOENT,
symlink-leaving-root.
codecontext_tools.test.ts: one assertion updated to expect the
resolved-absolute file_path on the wire (previously asserted the raw
relative path passed through, which is exactly the bug being fixed).
Full suite: 301 passed, 7 skipped.
## Affected / unaffected
- get_codebase_overview, get_framework_analysis, watch_changes,
search_symbols: no file_path arg → resolver guard skips them. No
behavior change.
- get_semantic_neighborhoods IS in SYNTHESIS_TOOLS — previously-failing
relative-path calls will now successfully synthesize. Desirable, not
a regression.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When the agent needed context from another repo, pathGuard rejected every read
with no recovery path. This batch adds a reactive request_read_access flow:
pathGuard's error now hints at the tool, the model emits a structured request,
the inference loop pauses (same mechanism as ask_user_input), the user picks
Allow/Deny via inline chips, and subsequent reads under the granted root succeed
for the rest of the session.
Schema: sessions.allowed_read_paths TEXT[] NOT NULL DEFAULT ARRAY[]::TEXT[]
(idempotent ADD COLUMN IF NOT EXISTS).
Grant unit (design D1): nearest registered projects.path ancestor →
nearest repo-shaped ancestor (.git/ / package.json / go.mod / Cargo.toml)
under PROJECT_ROOT_WHITELIST → else refuse. grant_resolver.ts walks
ancestors with a per-iteration whitelist invariant check so symlinked
input can't escape the whitelist mid-walk (Sam's checkpoint-1 ask).
Path-guard: optional extraRoots arg threaded from session.allowed_read_paths
through executeToolCall to view_file / list_dir / grep / find_files. The
ToolDef.execute signature gets an optional third param; non-FS tools
ignore it. view_file re-anchors the secret-guard check on basename(real)
whenever a relative path starts with "../" so .env / id_rsa* etc. still
deny across grant roots.
Endpoint: POST /api/chats/:id/grant_read_access mirrors /answer_user_input.
On 'allow' it re-resolves the grant root (state may have changed since
prompt — auto-falls to denial reason text on failure, not 500), array_appends
to sessions.allowed_read_paths with in-memory dedup, then publishes
tool_result + session_updated frames and enqueues the next assistant turn.
PATCH /api/sessions/:id allowed_read_paths supports revocation only. Zod
refines absolute + no traversal markers; runtime findUnauthorizedAdditions
guard rejects any entry not already present in the row, so a malicious
curl -X PATCH -d '{"allowed_read_paths":["/etc"]}' returns 400 instead of
bypassing the grant flow (Sam's compliance-review action item).
Frontend: RequestReadAccessCard renders pending (path + reason + Allow/Deny)
and answered (granted/denied summary with the resolved root) variants;
MessageList.flatten/group special-cases the tool name; SettingsPane adds a
per-session grants list with per-row revoke that PATCHes the shortened
array.
Tests: 11 grant_resolver, 8 path_guard, 8 sessions PATCH subset, including
explicit cases for symlink escape mid-walk, walk-bound termination at
whitelist root, /etc bypass attempt via PATCH, and nearest-project
disambiguation. 292 total server tests green.
Pairs with v1.13.16-xml-parser — the model now self-recovers from both
a wrong tool name AND from a refused path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two-part fix for the model-emitted XML drift the v1.13.15-codecontext-synth
investigation surfaced (1 raw <invoke> leak observed out of 190 qwen3.6
turns — qwen3.6-35b-a3b-mxfp4 drifts to the Anthropic format when prompted
as an Architect-style agent because Claude Code documentation in its
pre-training corpus uses that shape).
## Parser extension
xml-parser.ts now recognizes BOTH XML tool-call flavors:
- Qwen/Hermes: <tool_call><function=NAME>...<parameter=K>V</parameter>...</function></tool_call>
- Anthropic: <invoke name="NAME"><parameter name="K">V</parameter></invoke>
Both route through the same synthetic-id xml_call_${idx} ToolCall path.
extractToolCallBlocks() and partialXmlOpenerStart() handle both openers
(<tool_call> and <invoke...) so partial buffers don't get prematurely
flushed during streaming.
The existing Qwen parser was tightened to tolerate whitespace around `=`
(<function = name>, <parameter = key>...) so a stray space doesn't get
absorbed into the function name. Name capture is non-whitespace,
non-`>`.
## Unknown-tool recovery hint
New tool-suggestions.ts exports levenshtein() + suggestToolName() +
formatUnknownToolError(). When tool-phase.ts:executeToolCall receives a
toolCall.name that isn't in TOOLS_BY_NAME, the error returned to the
model now includes a "Did you mean: X?" hint based on Levenshtein
distance ≤3 or substring match against Object.keys(TOOLS_BY_NAME).
Targets the qwen3.6 drift to read_file → suggest view_file. Applies to
all unknown tool names, not just <invoke>-derived ones — at the
dispatch layer we no longer know which format produced the call, and
the extra signal is harmless for Qwen-derived calls.
## Test coverage
xml-parser.test.ts: 46 tests, all green. Covers both parsers
(well-formed, malformed, multi-parameter, nested-content), the
partial-opener detector for both flavors, the unified extraction
helper, and the unknown-tool error formatter.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
First half of the WebSocket-frame-typing batch (split per recon — total
scope was ~535 LoC, larger than the roadmap's ~300 estimate, so the
server-side publish-site conversion lands separately in v1.13.11-b).
Phase A scope:
(1) apps/server/src/types/ws-frames.ts (NEW) — Zod schemas for all 27
wire-format WS frame types. Discriminated union (WsFrameSchema) plus
KNOWN_FRAME_TYPES const for diagnostic lookup. UUIDs are z.string().
uuid(); model-emitted tool_call_id stays z.string().min(1) since OpenAI-
compatible APIs emit "call_<random>" not UUID. Per-kind payload narrowing
(tool args, message_parts payloads) intentionally stays z.unknown() —
frame-level drift detection is the goal; deep payload validation is
follow-up work.
(2) apps/web/src/api/ws-frames.ts (NEW) — byte-identical mirror of the
authoritative server file. No path alias from web→server in the existing
tsconfig setup; sync-by-hand was chosen over a new packages/shared/ dir.
A ws-frames.test.ts test asserts the two files match.
(3) apps/server/src/services/broker.ts — adds publishFrame() and
publishUserFrame() methods to the Broker interface. Both validate via
WsFrameSchema and fail-closed: log + drop on invalid. createBroker now
accepts an optional FastifyBaseLogger so validation failures land in
the pino stream (with console.error fallback for unit tests). The
existing publish() / publishUser() raw methods stay legal — they get
converted to the typed variants in v1.13.11-b.
(4) apps/web/src/hooks/useSessionStream.ts + useUserEvents.ts — wrap
ws.onmessage with WsFrameSchema.safeParse. Fail-closed: invalid frames
log + return without dispatching. Hand-maintained WsFrame and
SessionEvent types stay in place; one cast bridges Zod-typed → narrowed
shape (Zod uses OpaqueObject for nested Message[] / WorkspacePane[] etc.,
which are dev-time-narrowed via the existing hand-maintained types).
(5) apps/web/package.json — adds zod ^3.23.8 as a direct dep. Was a
transitive dep via ai-sdk / postgres; promotion makes the import legal.
(6) Tests: 15 new in ws-frames.test.ts covering happy-path per major
frame type, drift-catchers (unknown type, invalid enum, non-UUID, negative
tokens), parts-authoritative read variants, the mirror-file diff check,
and four broker fail-closed scenarios. 219/219 server tests pass (was
204; +15 new).
Two recon corrections to the dispatch brief, both flagged before
implementation:
- No 'parts_appended' frame exists. The brief assumed one; the codebase
reads parts via the messages_with_parts view after message_complete
triggers a refetch. MessagePartSchema is therefore unused this batch.
- No 'tool_running' frame exists. The brief listed it as standalone; it
is in fact a 'chat_status' variant ({ status: 'tool_running' }), already
covered by ChatStatusFrame.
Smoke: clean container boot, no validation errors in the server log. Real
production frames pass validation (the schemas were derived from the
existing hand-maintained types in api/types.ts and sessionEvents.ts).
v1.13.11-b will follow immediately: convert all ~85 raw broker.publish /
ctx.publish call sites across 11 server files to publishFrame /
publishUserFrame. Mechanical edit; the wiring done here means the diff
in -b is just the call-site swaps.
~310 LoC across 9 files (4 new + 5 modified).
Pattern lift from eyaltoledano/claude-task-master (MIT + Commons Clause
— pattern only, no code lift). Adds BOOCODE_TOOLS env var with three
tiers:
- core (4 tools): view_file, list_dir, grep, find_files. ~2k token
schema cost.
- standard (15 tools): core + web_search, web_fetch, git_status, all
8 codecontext_* tools. ~10k token schema cost.
- all (default; current behavior): every tool in ALL_TOOLS (20). ~21k
token schema cost.
The env var is a CEILING — narrows agent whitelists, never expands.
Default behavior unchanged when var is unset. resolveToolTier is
case-insensitive and falls back to 'all' on unknown values.
CORE_TOOL_NAMES + STANDARD_TOOL_NAMES validated at module load against
TOOLS_BY_NAME via two top-level for-loops that throw on the first
missing name. Module fails to import if a tier references a tool that
doesn't exist in the registry — catches typos and stale tier
definitions at boot rather than silently filtering valid tools out of
agent whitelists.
Wiring: agents.ts parseAgentBlock now reads BOOCODE_TOOLS from
process.env per parse, intersects with the agent's declared frontmatter
tools (or DEFAULT_TOOLS when frontmatter omits the field). Per-parse
read is fine — agents are re-parsed on the existing 60s cache TTL.
Tests: tools.test.ts grows from 1 to 10 tests. Covers resolveToolTier
across tiers/case/unknown values + the CORE-subset-of-STANDARD invariant
+ TOOLS_BY_NAME existence for both tier sets. 204/204 pass (was 195;
+9 new).
Deviation from the brief: the codecontext tools in the actual registry
have NO codecontext_* prefix (the brief's STANDARD list assumed it).
Used the actual names (get_codebase_overview, search_symbols, etc.).
Module-load validation would have failed boot with the prefixed names.
Smoke: with BOOCODE_TOOLS unset, agents return their full 12-tool
whitelists. With BOOCODE_TOOLS=core in .env + container restart, the
same agents narrow to 4 tools (find_files, grep, list_dir, view_file)
— intersection of declared whitelist ∩ core tier. Reverted after
confirmation.
CLAUDE.md updated with BOOCODE_TOOLS in the Environment section's
Optional list. .env.example gained a commented BOOCODE_TOOLS=all line
with the per-tier token-cost table.
~110 LoC across 5 files (4 modified + 1 test expansion). Under the
brief's ~30 LoC estimate for code; the test suite expansion drove
most of the growth.
Surfaces per-tool prompt/completion-token rolling averages in
AgentPicker for at-a-glance agent-cost hints. Implementation is a
SQL view on top of messages_with_parts plus a read endpoint and
AgentPicker tooltip extension. No new write site; all source data
already lands via the existing tool-phase.ts:94-95 / error-handler.ts:
109-110 / sentinel-summaries.ts UPDATEs that v1.13.7's includeUsage:
true fix made non-NULL.
(1) schema.sql — new tool_cost_stats view. Window-functions over
messages_with_parts.tool_calls with LATERAL jsonb_array_elements.
Attribution: equal split — multi-tool turn divides tokens N-ways;
the 100-call rolling mean absorbs split noise. Filters: status=
'complete' + metadata.kind NOT IN ('cap_hit','doom_loop') exclude
failed turns and sentinels respectively; tool_calls IS NOT NULL is
defense-in-depth since sentinels are role='system' rows. CREATE OR
REPLACE means schema apply is idempotent.
(2) routes/tools.ts NEW + index.ts wire-in. GET /api/tools/cost_stats
returns { stats: ToolCostStat[] } with mean_prompt_tokens / mean_
completion_tokens computed at read time (sum / n_calls). Sorted by
tool_name ASC. No pagination — ≤30 tools.
(3) __tests__/tool_cost_stats.test.ts NEW — 7 integration tests
keyed off DATABASE_URL env var. Tests skip gracefully when unset
(no-DB default). beforeAll applies the schema via sql.unsafe(read
FileSync(schema.sql)) for self-contained runs. Helper insertAssistant
Turn shared across cases. Covers: empty state, single-tool attribution,
multi-tool equal split, 100-call FIFO window, NULL-tokens exclusion,
parts-authoritative read via messages_with_parts, failed/sentinel
exclusion.
(4) web/api/types.ts + client.ts — ToolCostStat interface + api.tools.
costStats() method binding.
(5) AgentPicker.tsx — fetch costStats on mount, compute per-agent
sum-of-means across whitelisted tools, render muted cost line below
description: "~5.2k prompt / 280 completion · 6/8 tools · last call
3h ago". Skips line entirely when no tool history; preserves existing
native title= for layout backward-compat. formatK/formatAgo colocated.
Tests: 202/202 pass (195 prior + 7 new view-integration). Server +
web tsc clean.
Smoke: schema applied cleanly; GET /api/tools/cost_stats returns
canonical JSON; view + endpoint agree. Single-row result expected
given the v1.13.1-A → v1.13.7 NULL latent regression window; new
traffic populates organically.
Roadmap row at boocode_roadmap.md:114 plus schema row at :474 both
match. View vs table decision documented in handoff_v1.13.10_per_
tool_cost.md (rollback-safe, microsecond-fast at BooCode scale).
~270 LoC across 8 files (5 modified + 3 new).
Opencode pattern (session/overflow.ts): fire compaction at 85% of
ctx_max, replacing the v1.11.0-era `ctx_max - 20_000` formula.
Old formula: usable = ctx_max - 20_000
- ctx=262144 → trigger at 242144 (92.4%) — only 7.6% headroom
- ctx=100000 → trigger at 80000 (80.0%)
- ctx= 32000 → trigger at 12000 (37.5%) — over-eager
- ctx<=20000 → trigger at 0 — never fires
New formula: usable = floor(0.85 * ctx_max)
- ctx=262144 → trigger at 222822 (85.0%) — 15% headroom for summarizer
- ctx=100000 → trigger at 85000 (85.0%)
- ctx= 32000 → trigger at 27200 (85.0%)
- ctx= 8192 → trigger at 6963 (85.0%)
Ratio gives consistent headroom at any context scale. The qwen3.6
daily driver gets ~19k tokens more breathing room before overflow;
small-ctx models no longer degenerate to never-triggering.
usable() is the only consumer of COMPACTION_BUFFER → constant deleted.
New EARLY_TRIGGER_RATIO constant takes its place.
isOverflow() and the maybeFlagForCompaction() call site at
payload.ts:184 are unchanged — formula swap is internal to compaction.ts.
payload.ts comment touched only to drop the stale COMPACTION_BUFFER
reference (PRUNE_TRIGGER_TOKENS stays at 20k as the prune-freed
threshold; independent of the overflow formula).
Tests: 4 new usable() corner cases (262k/100k/8k/zero+negative), plus
5 isOverflow() numbers shifted to match the 85k budget at ctx=100k.
195/195 server tests pass (was 194).
Smoke: ratio math verified by unit tests at all four corners. Live
cap-hit verification deferred — requires accumulating >222k tokens
in a session under qwen3.6-35b-a3b-mxfp4 (was >242k pre-fix); will
surface organically in extended use.
Recon during planning disproved the original v1.13.7 (DB-cache) premise:
buildSystemPrompt already runs over inputs mtime-cached at the file layer
(BOOCHAT.md in system-prompt.ts:25, AGENTS.md global+per-project in
agents.ts:245), and DB scalars are byte-stable until edited. The output
is microsecond pure-string concat with no I/O. Skills aren't in the
prefix; tools live in a separate request body field alpha-sorted by
v1.13.3.
This batch closes the verification gap with instrumentation, not
implementation:
- system-prompt.ts: buildSystemPromptWithFingerprint canonical impl
computes SHA-256 over the assembled prefix, runs a per-session
Map<sessionId, lastHash> observer, emits PrefixFingerprint per call
and PrefixDrift (with field-level changed_inputs) on hash change.
buildSystemPrompt is now a thin shim returning .prompt.
- agents.ts: getAgentsMtimes accessor — cache-read only, no I/O.
- payload.ts: buildMessagesPayload takes optional log argument; when
passed, emits prefix-fingerprint (info) + prefix-drift (warn).
- turn.ts + sentinel-summaries.ts: pass ctx.log at 3 production call
sites; sentinel summaries log too so any drift across cap-hit /
doom-loop paths surfaces.
- system-prompt.test.ts: 4 new tests (byte-identical, no-drift-on-
stable, drift-fires-with-changed-inputs, cross-session-no-drift).
194/194 tests pass (was 190).
Smoke: 5 messages in a fresh session produced 7 prefix-fingerprint
logs (extras from buildMessagesPayload being called from sentinel
summary paths), all with identical prefix_hash and prefix_length=2907,
zero prefix-drift. Prefix is byte-stable in steady-state.
Decision: original system_prompt_cache DB table from the roadmap is
permanently dropped. The v1.12.0 mtime caches at the input layer plus
alpha tool ordering at the request body (v1.13.3) already address the
load-bearing cache-stability surfaces. Instrumentation stays so the
claim can be re-verified at any time.
Audit traced compaction's summary path post-v1.13.1-B read flip:
- Q1: reads from messages_with_parts (view) — clean
- Q2: parts shape correctly threaded through buildHeadPayload — clean
- Q3: reasoning omitted from summary input — FIX NEEDED
v1.13.1-C wired reasoning end-to-end into inference/payload.ts but
missed this read site. Summarizer model couldn't see the reasoning
trail for tool-bearing turns, quietly degrading summary quality for
reasoning-channel models (qwen3.6).
Fix:
- CompactionMessage extended with reasoning_parts field
- SELECT pulls reasoning_parts from messages_with_parts
- buildHeadPayload (now exported for tests) prefixes assistant content
with <reasoning>...</reasoning>\n\n<content>... when reasoning is
present; standalone <reasoning>...</reasoning> for tool-call-only
turns; omits the tag when reasoning is null or empty
4 new render branch tests (190 total).
Smoke deferred: forcing real compaction requires either threshold
pollution or building up a >40k-token chat with reasoning_parts.
Render branches are unit-covered; integration would only re-prove
structural correctness.
- New services/truncate.ts. Tmpfs storage at /tmp/boocode-truncations/
(BOOCODE_TRUNCATION_DIR env var overrides for tests). 12-char base32
opaque ids (~60 bits entropy, "tr_<id>"). Three exports: storeTruncation,
readTruncation, truncateIfNeeded (wrap-or-passthrough helper).
cleanupTruncations does TTL-pass (7 days) + orphan-reap (parts query on
payload->'output'->>'outputPath') in one shot.
- Wired four tools through truncateIfNeeded: view_file (raw full file),
list_dir (full filtered+secret-filtered entries serialized one-per-line),
web_fetch (textRaw pre-slice), codecontext_client (body.result pre-slice).
Each returns the existing sliced view plus an optional outputPath field
when truncation fires.
- New view_truncated_output ToolDef. Resolves opaque id → on-disk content
internally; model never sees the truncation dir. Same start_line /
end_line slicing semantics as view_file. Registered in ALL_TOOLS (alpha
sort places it after view_file automatically) and READ_ONLY_TOOL_NAMES.
- cleanupTruncations piggybacks on the v1.13.3 stuck-row sweeper's 60s
setInterval. No-op when truncation dir is empty.
Not wired (TODO follow-up): grep and find_files. file_ops returns post-cap
results to the tool execute path, so the "full content" isn't recoverable
without a refactor of fileOps.grep / fileOps.findFiles to expose the
uncapped result. web_search is silent-slice (no truncated flag); outside
scope. Five sites of seven covered; the remaining two are the only ones
needing a file_ops change.
Tests: 7 new in truncate.test.ts (roundtrip, unknown id, malformed id,
truncateIfNeeded false/true/over-cap/storage-failure paths). 186 total
(was 179). cleanupTruncations file-system half implicitly via TTL pass;
orphan-reap branch covered by the live container smoke.
Smoke verified end-to-end against the live container:
- view_file with start_line=1, end_line=3 on CLAUDE.md → tool_result part
carried outputPath "tr_cdpn1o04k6ma" + truncated=true.
- /tmp/boocode-truncations/tr_cdpn1o04k6ma exists, 15876 bytes, mode 0o600,
parent dir mode 0o700.
- Follow-up view_truncated_output(id, start_line=50, end_line=55) returned
the actual lines 50-55 of CLAUDE.md (the 808notes/BooCode bullets).
- ALL_TOOLS count=20 (was 19); alpha sort places view_truncated_output
between view_file and watch_changes.
Closes a v1.12 catalog row that was scoped but deferred. The v1.13 parts
table made outputPath ride on the existing tool_result payload with no
schema change beyond the storage helper itself.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- message_parts.hidden_at timestamptz column (NULL by default) with a
partial index on (message_id) WHERE hidden_at IS NULL for the common
visible-parts filter.
- messages_with_parts view changed from COALESCE(parts, legacy) to
CASE WHEN EXISTS(any parts of kind) THEN visible-parts ELSE legacy.
COALESCE would have leaked hidden parts back via the legacy fallback
when every part was pruned (smoke caught it pre-commit). The CASE
distinguishes "no parts at all → fall back to legacy column for
pre-v1.13.0 history" from "all parts hidden → return null/empty so
the row drops out of the model payload" exactly.
- prune.ts: scans tool_result parts newest-first, protects the last 40k
tokens (PROTECTED_TOKENS), marks older candidates hidden when their
combined estimate clears 20k (PRUNE_TRIGGER_TOKENS — equal to
COMPACTION_BUFFER from v1.11.0, so a successful prune is exactly the
budget the summary path would have freed). Stops at chats.tail_start_id
so it doesn't double-erase across the last summary boundary. Pure
decision helper selectPruneTargets exported separately for unit tests.
- Wired into maybeFlagForCompaction: prune runs synchronously when
overflow is detected; if it freed >= PRUNE_TRIGGER_TOKENS, the
needs_compaction flag is NOT set and the (expensive) summary inference
call is skipped this turn. The next turn's overflow check re-evaluates
from scratch.
- 6 new unit tests in prune.test.ts cover: empty input, protection-only
(no candidates), candidates below trigger, candidates above trigger,
candidates straddling a summary boundary, exactly-protection-tokens.
179 tests total (was 173).
Smoke verified post-rebuild:
- \\d message_parts shows hidden_at + partial index.
- View definition shows AND p.hidden_at IS NULL filters on all three
subselects.
- Synthetic hide-then-restore confirmed the view drops the tool_result
jsonb to null when its only part is hidden, and restores when un-hidden.
- EXPLAIN ANALYZE on the 42-message stress chat: 0.325ms (faster than
v1.13.1-B's 1.018ms — EXISTS short-circuits cleanly for the common
no-parts case).
- Normal turn (plain text prompt) completes unaffected.
Closes a v1.11.0 design item that was scoped but never implemented. With
v1.13's parts table the prune is dramatically cheaper to write — pre-parts
it would have meant editing JSON blobs in-place; now it's a hidden_at
flag and a view subselect.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Four independent items, all owed from prior dispatches.
- statement_timeout at the database level via:
ALTER DATABASE boocode SET statement_timeout = '30s';
Applied operationally; documented as a comment at the top of schema.sql
(ALTER DATABASE can't run inside a DO block, so it's not idempotent
inside applySchema). Re-apply after a volume reset.
- Tool registry alpha-sorted at module load. llama.cpp's prompt cache
hits on byte-identical prefixes; any reordering of the tool list near
the top of the system prompt would invalidate every cached turn.
Single-source sort at the ALL_TOOLS export so toolJsonSchemas() and
TOOLS_BY_NAME inherit the order automatically. New tools.test.ts
asserts the invariant; total tests 173 (was 172).
- Periodic in-process stuck-row sweeper. Runs every 60s, marks
'streaming' rows older than 5 minutes as 'failed', and publishes
chat_status='idle' on the user channel so the UI dot drops without a
refresh. Closes the mid-session crash UX gap; the v1.12.1 boot sweep
only fires once at startup, so sessions used to stay stuck until next
reboot. setInterval cleaned up via app.addHook('onClose'). Mirrors
handleAbortOrError's publish pattern.
- experimental_repairToolCall wired through AI SDK v6 streamText. Pass-
through implementation: log + return the original toolCall so the
stream keeps going. executeToolPhase's existing error paths (unknown
tool name → 'unknown tool: X' result; zod-reject → 'tool X rejected
— field: required') already surface bad calls to the model; the value
here is preventing the AI SDK from THROWING on parse errors and
killing the whole stream. Owed since v1.13.1-A.
Smoke verified:
- statement_timeout = '30s' confirmed via SHOW.
- Tool path normal flow intact (list_dir prompt → tool_call → result
→ final assistant). No malformed tool calls in the test run; repair
log will surface them when qwen3.6 actually emits one.
- Alpha order verified at runtime via the dist bundle: match: true.
- Sweeper logic not traffic-tested (no stuck rows to find), but the
SQL UPDATE + broker.publishUser pattern is identical to handleAbort
and the boot sweep — synthesis-only verification.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pass 1 — ask_user_input correlation port (messages.ts:478, :549):
- The two correlation queries that backed the elicitation flow used to scan
messages.tool_calls and messages.tool_results JSON columns directly. They
now JOIN message_parts on payload->>'id' (for the caller assistant) and
payload->>'tool_call_id' (for the pending tool row). Semantics preserved:
ORDER BY m.created_at DESC LIMIT 1 still picks the latest issuance, the
already-answered 409 guard now reads payload.output, and the UPDATE +
parts replace inside sql.begin is unchanged from v1.13.0.
- Pre-v1.13.0 history has no parts rows and is unreachable to this lookup
path (404). Acceptable per dispatch decision — no pending elicitation
from before v1.13.0 will still be open. JSON-column fallback can land as
a hotfix if it ever surfaces.
Pass 2 — reasoning_parts wired end-to-end:
- types.ts/StreamResult gains `reasoning: string`. stream-phase.ts accumulates
reasoning-delta text per stream (replacing the v1.13.1-A counter-only
diagnostic) and returns it on the result.
- parts.ts/partsFromAssistantMessage gains an optional `reasoning` param.
When present it emits a kind='reasoning' part at sequence 0, ahead of
the text and tool_call parts.
- error-handler.ts/finalizeCompletion and tool-phase.ts/executeToolPhase
both thread result.reasoning into the dual-write call so reasoning-channel
models (qwen3.6) get persistent reasoning rows.
- payload.ts: loadContext SELECT pulls reasoning_parts from the v1.13.1-B
view; OpenAiMessage gains an optional `reasoning` field; buildMessagesPayload
collapses reasoning_parts into a single string per assistant message.
- stream-phase.ts/toModelMessages converts assistant messages with reasoning
into an AI SDK ModelMessage content array starting with a ReasoningPart,
matching the @ai-sdk/provider-utils AssistantContent union. Reasoning
models can now replay prior reasoning context across tool-call boundaries.
- types/api.ts and apps/web/src/api/types.ts Message interface gain
reasoning_parts (optional, nullable). Frontend doesn't render this yet —
field reserved for a v1.14 UI surface.
Tests: 2 new in parts.test.ts cover reasoning-at-sequence-0 with and
without text content. 172 tests pass (170 prior + 2 new).
Smoke verified against the live container:
- A reasoning-prompt ("walk through 17 × 23 step by step") produced one
message with kind='reasoning' (361 chars) at sequence 0 and kind='text'
(429 chars) at sequence 1. Adapter log confirmed reasoning capture.
- The new correlation SQL was validated against existing tool_call /
tool_result parts: returns the expected message_id + payload shape with
pending state correctly identified via payload.output IS NULL.
- ask_user_input end-to-end through the UI is Sam's smoke — the Prompt
Builder agent does not always trigger ask_user_input for these prompts,
so synthetic verification via SQL substituted for traffic-driven cover.
Annotation: the v1.13.1-A abort-throw site in stream-phase.ts got a
one-liner comment ("AI SDK v6 fullStream returns normally on abort; check
signal explicitly.") to prevent a future refactor removing it.
v1.13.2 drops the dual-write + the JSON columns + collapses the view.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a granular message_parts table (one row per text/tool_call/tool_result
chunk) without changing any read path. Old messages.content / tool_calls /
tool_results columns remain authoritative for v1.13.0; this dispatch is
write-only mirroring so the AI SDK migration in v1.13.1 can flip read
authority without a backfill window.
Schema:
CREATE TABLE message_parts (id, message_id FK ON DELETE CASCADE,
sequence int, kind text CHECK (text|tool_call|tool_result|reasoning|step_start),
payload jsonb, created_at, UNIQUE (message_id, sequence))
New module services/inference/parts.ts with two pure derive helpers
(partsFromAssistantMessage, partsFromToolMessage) and insertParts that
fan-outs a multi-row INSERT via postgres-js.
Wired dual-write at every site that writes tool_calls or tool_results:
- tool-phase.ts: assistant finalize UPDATE, executed-tool UPDATE,
ask_user_input sentinel UPDATE
- messages.ts answer flow: DELETE pending tool_result part + INSERT
answered one inside the existing sql.begin
- skills.ts: synthetic assistant + tool INSERTs both inside existing tx
- chats.ts fork: CTE clones parts via ROW_NUMBER pairing (source→dest
message id mapping in one statement, no N+1)
- error-handler.ts finalizeCompletion: text part for plain text-only
assistant turns
Deviation: tool-phase.ts finalize UPDATEs and finalizeCompletion text-part
write are not wrapped in fresh sql.begin transactions. Safe in v1.13.0
because JSON columns are authoritative for reads. v1.13.1 must wrap these
sites before flipping read authority — TODO comments added at each
unwrapped site referencing v1.13.1.
Tests: 8 new unit tests for the derive helpers in
services/__tests__/parts.test.ts. Existing 162 tests untouched. 170 total.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- sentinel-summaries.ts: runCapHitSummary, insertCapHitSentinel,
runDoomLoopSummary, insertDoomLoopSentinel
- inference.ts → inference/turn.ts: residue is runAssistantTurn,
runInference, createInferenceRunner orchestration only
- inference/index.ts: re-export shim preserves the public surface
(createInferenceRunner, runInference, runAssistantTurn,
detectDoomLoop, DOOM_LOOP_THRESHOLD, buildMessagesPayload, plus
type-side InferenceContext/InferenceFrame/StreamResult/TurnArgs/
FramePublisher)
- src/index.ts + auto_name.ts + the two vitest test files updated to
import from ./services/inference/index.js explicitly (NodeNext ESM
doesn't honor directory-index resolution)
Final tally: 11 files under services/inference/, the largest being
sentinel-summaries.ts at 523 LoC (two near-clone summary paths kept
side-by-side until a third sentinel justifies factoring out a shared
runWrapUpSummary). turn.ts is now 326 LoC, the next-largest is
stream-phase.ts at 380. Public import surface unchanged.
tool-phase.ts → turn.ts back-edge for runAssistantTurn remains
(cycle is safe; resolved at call time).
Prepares the file structure for v1.13 AI SDK migration — streamText
swap targets stream-phase.ts only.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds two new tools registered through the existing ALL_TOOLS registry:
- web_search hits SearXNG's JSON API (Fathom, internal Tailscale URL,
no auth) and returns top results
- web_fetch retrieves a URL's text content, gated by isPublicUrl
(url_guard.ts) which blocks loopback / RFC1918 / Tailscale CGNAT /
link-local / .local / .internal / non-http schemes
Both tools are opt-in via the existing session.web_search_enabled flag
(plumbed in v1.9, activated here). Default off. UI labels updated to
"Enable web search and fetch" / "Web search and fetch" since fetch joins
the same store. Counts against the v1.8.2 per-turn budget; covered by
the v1.11.6 doom-loop guard.
Native Node 20 fetch — no new prod dep. HTML stripping via regex (script
and style content elided wholesale). 5MB body cap, 15s fetch timeout,
8000-char default output, 32000-char cap.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Ports continue.dev's DEFAULT_SECURITY_IGNORE_FILETYPES + ignored-dir lists
into apps/server/src/services/secret_guard.ts plus a small BooCode
additions block (id_rsa*, *credentials*, .netrc, *.kdbx). Tiny glob-to-
regex matcher; no new prod dep.
view_file hard-refuses via SecretBlockedError. list_dir / grep /
find_files filter their results and surface a pathguard_note string
field with the hidden count — never list the offending paths back.
Named secret_guard.ts (not safety/pathGuard.ts) to avoid collision with
the existing path_guard.ts which already exports a pathGuard() function.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- llama-server does not emit n_ctx in timings (confirmed empirically);
dead code at inference.ts:479 and compaction.ts:300 never fired
- New model-context.ts: cached fetch of /upstream/<model>/props
with positive-cache (no TTL) and 60s negative-cache
- Wired into all 4 ctx_max write sites: 3 in inference.ts
(executeToolPhase, finalizeCompletion, runCapHitSummary) and
1 in compaction.ts (summary row INSERT)
- AbortController 3s timeout, lenient parsing with sensible defaults
- 12 new vitest cases for the cache module (59 total)
- 7 historical assistant rows backfilled manually (see notes)
Adds a singleton, ephemeral 'settings' pane kind to the workspace.
Opened via a new bottom-pinned button in ProjectSidebar (emits an
open_settings_pane event when a session is mounted; navigates to
/settings otherwise). Pane has three sections — Session, Project,
Theme — and a maximize toggle that hides sibling pane columns via
display:none on desktop only. Settings panes don't count toward
MAX_PANES and are filtered out of the localStorage persistence layer
so reload always restores a clean workspace.
Schema (additive):
- projects.default_system_prompt TEXT NOT NULL DEFAULT ''
- projects.default_web_search_enabled BOOLEAN NOT NULL DEFAULT false
- sessions.web_search_enabled BOOLEAN (nullable; null = inherit)
Inference resolves user_prompt = session.system_prompt.trim() ||
project.default_system_prompt.trim() — empty/whitespace at either
layer means "no override". Keeps the columns NOT NULL and matches
the existing inherit semantics.
Server routes:
- GET /api/projects/:id (new; settings pane refetches on
project_updated)
- PATCH /api/projects/:id accepts default_system_prompt,
default_web_search_enabled
- PATCH /api/sessions/:id accepts web_search_enabled (tri-state)
- POST /api/projects/:id/sessions/archive-all + GET
/api/projects/:id/sessions/open-count
- POST /api/sessions/:id/chats/archive-all + GET
/api/sessions/:id/chats/open-count
- PATCH /api/sessions/:id now broadcasts session_updated on every
successful PATCH (was rename-only). Lets SettingsPane open in
another tab pick up edits without a refetch.
Bulk-archive publishes one session_archived / chat_archived frame
per affected id so useSidebar's existing reducer cases handle them
incrementally — no new frame type, no payload widening.
ModelPicker refactored: shared ModelList inside a responsive shell.
Desktop = labeled trigger + DropdownMenu, mobile = icon-only Cpu
button + BottomSheet. Header in Session.tsx drops the pill wrap on
mobile since the new trigger is the visual.
ChatInput gains an icon-only '+' DropdownMenu next to AgentPicker
when sessionId + webSearchEnabled props are provided. One item for
now — Web search — with a checkmark reflecting the stored value
(true), not the effective one. Click PATCHes the override; to
restore inherit-from-project the user opens SettingsPane.
ThemePicker lifted out of pages/Settings.tsx into a reusable
component. The standalone /settings route is now a thin wrapper
that mounts <ThemePicker /> with a Back button on top
(navigate(-1) with fallback to '/'); the SettingsPane Theme tab
renders the same picker bare.
Project section delete-flow removed (button + confirm dialog +
handler). Replaced with "Archive all sessions" using the same
two-step count → confirm → fire pattern as "Archive all chats" in
the Session section. api.projects.remove() stays in the client
because useProjects.ts still uses it.
Hand-rolled Switch primitive in SettingsPane (no shadcn switch in
the project; spec said no new deps). Section nav is plain buttons
(no shadcn Tabs).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Old hardcoded MAX_TOOL_LOOP_DEPTH=15 replaced by per-agent
max_tool_calls (1-100, AGENTS.md frontmatter) with defaults: 30 for
read-only-only agents, 10 for agents that include any non-read-only
tool, 15 for raw chat. When the loop hits cap, fire one final summary
call with tools disabled, stream the wrap-up into the in-flight
assistant message, then insert a system sentinel with
metadata.kind='cap_hit'. The sentinel renders an amber bubble with a
Continue button (latest sentinel only) that POSTs to a new
/api/chats/:id/continue route to extend. Hard ceiling: 3 cap-hits per
chat (2 continues max) — third sentinel reports can_continue=false.
Error frames carry a machine-readable reason code alongside human
error text. Failed messages persist the reason via
metadata.kind='error' so the bubble renders specifics on reload (WS
error frame is one-shot).
Tool call UI rewired: ToolCallLine renders inline (↳ name args
spinner/check/✗, expand-on-tap for args+result); ToolCallGroup
collapses 3+ consecutive same-tool runs into a compact card.
MessageList owns a three-pass pre-render (flatten + fold tool
results onto matching runs by id + group same-tool runs + number
sentinels). MessageBubble drops tool rendering and adds the
sentinel / error-reason branches. ToolCallCard deleted.
Roadmap follow-up logged: add explicit max_tool_calls: 30 to the 6
agents in /data/AGENTS.md and /opt/boocode/AGENTS.md post-ship for
discoverability (defaults handle behavior identically).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds vitest 3.x (pinned to ^3 because vitest 4 requires Vite 6, while the
web app pins Vite 5). Tests live under src/**/__tests__/**.
Three target functions:
- sanitizeFolderName (project_bootstrap.ts): 8 cases covering happy path,
path-traversal stripping, empty-after-sanitize, control chars, truncation
at 64, null bytes, leading/trailing dot/slash stripping.
- resolveProjectPath (projects.ts): 7 cases including symlink-escape via
realpath, outside-whitelist rejection, nonexistent path, AND a flagged
BEHAVIOR GAP: passing the whitelist path itself currently returns success
rather than erroring out (function early-exits the scope check when
real === whitelistReal). Test asserts current behavior with explicit
comment flagging the spec violation — function NOT silently patched.
Function made exportable for testing (single keyword change).
- buildMessagesPayload (inference.ts): 8 cases for compact-marker logic
(no marker, marker present, multiple compacts, tool-message position).
tsconfig.json excludes __tests__ + *.test.ts from emit so dist/ stays clean.
pnpm -C apps/server test => 23 passed in ~340ms.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>