Compare commits

..

21 Commits

Author SHA1 Message Date
ad45b28250 v1.13.19-html-artifact-panes: pane-based artifact viewer with on-request HTML
Every assistant message gets an "Open in pane" affordance that opens the
message in the workspace splitter — Markdown pane (Copy + Download .md) by
default; HTML pane (Download .html only) when the model emits a self-contained
<!DOCTYPE html> or fenced ```html artifact. BOOCHAT.md rule keeps Markdown
default at every length; HTML opt-in on explicit user request.

Backend: services/artifacts.ts (slug derivation + write helpers with
symlink-escape guard via realpath-after-mkdir), routes/artifacts.ts (POST
download + GET stream with nosniff + CSP sandbox defense-in-depth), HTML
detection in finalizeCompletion writing a new message_parts.kind='html_artifact'
row (schema CHECK extended via v1.13.13 pattern), graceful 1MB cap via the
pure decideHtmlArtifactWrite helper. PartKind union extended.

Frontend: MarkdownRenderer.tsx extracted from MessageBubble's inline
MarkdownBody for reuse; MarkdownArtifactPane.tsx + HtmlArtifactPane.tsx with
loading/error states; pane state is reference-only ({chat_id, message_id,
title}) — content fetched on mount to keep workspace_panes jsonb small and
avoid 1MB blobs riding session_workspace_updated frames. iframe sandbox
locked to allow-scripts allow-clipboard-write allow-downloads with no
allow-same-origin, srcDoc not src. openInPane discriminates 404 (expected
fallback) from real errors (toast + bail). PanelRightOpen icon button with
mobile 44px tap-target.

31 new server unit tests including a real-symlink filesystem case; 332/332
server tests passing, tsc clean both sides, pnpm -C apps/web build green.
Smoke deferred to first deploy.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 12:43:13 +00:00
1a889dcde3 v1.13.18-codecontext-file-path: resolve file_path against project root in codecontext wrappers
Four codecontext sidecar wrappers — get_file_analysis (required
file_path), get_symbol_info, get_dependencies, and get_semantic_neighborhoods
(optional) — forwarded file_path to the HTTP sidecar unchanged. The
sidecar's internal file index is keyed on absolute paths, so any
relative path from the model returned "File not found in graph".
Three back-to-back failures observed in one chat on 2026-05-22
17:56 UTC, ~48 s of wasted tool budget.

## Resolver

Add resolveProjectPath(projectRoot, rawPath) in codecontext_client.ts:
trim check → absolute/relative branch (both go through resolve() so
dot-segments normalise) → realpath with ENOENT fallthrough → escape
check using the realpathed value. Error shape mirrors the existing
target_dir escape error byte-for-byte; only the field name differs.

Wired into callCodecontext at the args-spread site, guarded on
file_path presence + non-empty. All four wrappers benefit from one
call site; wrappers without file_path (overview, framework, watch,
search) are unaffected.

## Schema trim

.trim() added to all four file_path Zod schemas:

  get_file_analysis:                  z.string().trim().min(1)
  get_symbol_info:                    z.string().trim().optional()
  get_dependencies:                   z.string().trim().optional()
  get_semantic_neighborhoods:         z.string().trim().optional()

Absorbs trailing newlines / whitespace from model output before the
resolver sees the value.

## Adversarial review fixes

Adversarial pass surfaced two P2 findings:

1. Absolute path with `..` resolving outside the project root (e.g.
   `<projectRoot>/../etc/passwd`) that ENOENTs at realpath would slip
   through the literal prefix-check: the raw string starts with
   `<projectRoot>/`. Fix: resolve() the absolute branch's candidate
   too, so dot-segments normalise before the prefix check.

2. No symlink-escape test coverage. Realpath's stated purpose
   (catching in-project symlinks pointing outside the project) was
   never tested. Added: create a tmpdir outside projectRoot,
   symlink projectRoot/evil-link → outside file, assert rejection.

## Tests

codecontext_client.test.ts: 19 tests (10 baseline + 9 new file_path
resolution cases). Cases cover: relative→absolute, absolute-inside,
relative-escape, absolute-outside, ENOENT-fallthrough, empty-string,
wrapper-without-file_path, absolute-with-`..`-ENOENT,
symlink-leaving-root.

codecontext_tools.test.ts: one assertion updated to expect the
resolved-absolute file_path on the wire (previously asserted the raw
relative path passed through, which is exactly the bug being fixed).

Full suite: 301 passed, 7 skipped.

## Affected / unaffected

- get_codebase_overview, get_framework_analysis, watch_changes,
  search_symbols: no file_path arg → resolver guard skips them. No
  behavior change.
- get_semantic_neighborhoods IS in SYNTHESIS_TOOLS — previously-failing
  relative-path calls will now successfully synthesize. Desirable, not
  a regression.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 21:54:16 +00:00
b52c5df705 v1.13.17-cross-repo-reads: on-demand read access to paths outside the project root
When the agent needed context from another repo, pathGuard rejected every read
with no recovery path. This batch adds a reactive request_read_access flow:
pathGuard's error now hints at the tool, the model emits a structured request,
the inference loop pauses (same mechanism as ask_user_input), the user picks
Allow/Deny via inline chips, and subsequent reads under the granted root succeed
for the rest of the session.

Schema: sessions.allowed_read_paths TEXT[] NOT NULL DEFAULT ARRAY[]::TEXT[]
(idempotent ADD COLUMN IF NOT EXISTS).

Grant unit (design D1): nearest registered projects.path ancestor →
nearest repo-shaped ancestor (.git/ / package.json / go.mod / Cargo.toml)
under PROJECT_ROOT_WHITELIST → else refuse. grant_resolver.ts walks
ancestors with a per-iteration whitelist invariant check so symlinked
input can't escape the whitelist mid-walk (Sam's checkpoint-1 ask).

Path-guard: optional extraRoots arg threaded from session.allowed_read_paths
through executeToolCall to view_file / list_dir / grep / find_files. The
ToolDef.execute signature gets an optional third param; non-FS tools
ignore it. view_file re-anchors the secret-guard check on basename(real)
whenever a relative path starts with "../" so .env / id_rsa* etc. still
deny across grant roots.

Endpoint: POST /api/chats/:id/grant_read_access mirrors /answer_user_input.
On 'allow' it re-resolves the grant root (state may have changed since
prompt — auto-falls to denial reason text on failure, not 500), array_appends
to sessions.allowed_read_paths with in-memory dedup, then publishes
tool_result + session_updated frames and enqueues the next assistant turn.

PATCH /api/sessions/:id allowed_read_paths supports revocation only. Zod
refines absolute + no traversal markers; runtime findUnauthorizedAdditions
guard rejects any entry not already present in the row, so a malicious
curl -X PATCH -d '{"allowed_read_paths":["/etc"]}' returns 400 instead of
bypassing the grant flow (Sam's compliance-review action item).

Frontend: RequestReadAccessCard renders pending (path + reason + Allow/Deny)
and answered (granted/denied summary with the resolved root) variants;
MessageList.flatten/group special-cases the tool name; SettingsPane adds a
per-session grants list with per-row revoke that PATCHes the shortened
array.

Tests: 11 grant_resolver, 8 path_guard, 8 sessions PATCH subset, including
explicit cases for symlink escape mid-walk, walk-bound termination at
whitelist root, /etc bypass attempt via PATCH, and nearest-project
disambiguation. 292 total server tests green.

Pairs with v1.13.16-xml-parser — the model now self-recovers from both
a wrong tool name AND from a refused path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 21:45:52 +00:00
2e1a81de72 v1.13.16-xml-parser: Anthropic <invoke> support + unknown-tool recovery hints
Two-part fix for the model-emitted XML drift the v1.13.15-codecontext-synth
investigation surfaced (1 raw <invoke> leak observed out of 190 qwen3.6
turns — qwen3.6-35b-a3b-mxfp4 drifts to the Anthropic format when prompted
as an Architect-style agent because Claude Code documentation in its
pre-training corpus uses that shape).

## Parser extension

xml-parser.ts now recognizes BOTH XML tool-call flavors:

  - Qwen/Hermes:   <tool_call><function=NAME>...<parameter=K>V</parameter>...</function></tool_call>
  - Anthropic:     <invoke name="NAME"><parameter name="K">V</parameter></invoke>

Both route through the same synthetic-id xml_call_${idx} ToolCall path.
extractToolCallBlocks() and partialXmlOpenerStart() handle both openers
(<tool_call> and <invoke...) so partial buffers don't get prematurely
flushed during streaming.

The existing Qwen parser was tightened to tolerate whitespace around `=`
(<function = name>, <parameter = key>...) so a stray space doesn't get
absorbed into the function name. Name capture is non-whitespace,
non-`>`.

## Unknown-tool recovery hint

New tool-suggestions.ts exports levenshtein() + suggestToolName() +
formatUnknownToolError(). When tool-phase.ts:executeToolCall receives a
toolCall.name that isn't in TOOLS_BY_NAME, the error returned to the
model now includes a "Did you mean: X?" hint based on Levenshtein
distance ≤3 or substring match against Object.keys(TOOLS_BY_NAME).
Targets the qwen3.6 drift to read_file → suggest view_file. Applies to
all unknown tool names, not just <invoke>-derived ones — at the
dispatch layer we no longer know which format produced the call, and
the extra signal is harmless for Qwen-derived calls.

## Test coverage

xml-parser.test.ts: 46 tests, all green. Covers both parsers
(well-formed, malformed, multi-parameter, nested-content), the
partial-opener detector for both flavors, the unified extraction
helper, and the unknown-tool error formatter.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 20:59:25 +00:00
61308cf17c v1.13.15-codecontext-synth: remove "tag pending" qualifier in roadmap
Trivial follow-up after the v1.13.15-codecontext-synth tag landed.
Retrospective bullet now describes the shipped state; cleanup-order
tracker marks the batch .

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 20:09:39 +00:00
3992a9fcb7 v1.13.15-codecontext-synth: forced second-inference synthesis for codecontext overview tools
After a codecontext overview-class tool call lands (get_codebase_overview,
get_framework_analysis, get_semantic_neighborhoods), the pipeline runs a
second inference pass that replaces the recursive runAssistantTurn. The
synth pass auto-fetches the top-N source files referenced in the
codecontext output plus project docs (BOOCHAT.md, AGENTS.md,
*roadmap*.md, CONTEXT.md), applies a 32k-token budget with explicit
drop-priority, and streams a structured response that grounds the model
in real load-bearing code rather than relying on the codecontext summary
alone. Smoke #1 (default) and #2 (Architect) both cite the correct
inference/turn.ts + tool-phase.ts + stream-phase.ts files; smoke #6
(fault injection) verifies the fall-through path marks the synth message
status='failed' and yields cleanly to the recursive turn.

## Truncation-aware extraction

codecontext's wrapper inline-truncates results at 32k chars. Without the
expansion step, the top-N file selection only saw the alphabetical head
of the codebase (apps/booterm/dist/*) and auto-fetched the wrong sources.
The pipeline now calls in-process readTruncation(outputPath) before
extracting referenced files, so top-N selection sees the full 80k+ char
output. The 32k truncated head still ships to the synth model — the
expansion is reference-extraction-only, preserving the token-budget
contract. Graceful degradation on readTruncation null/throw: log warn,
fall back to the truncated head.

## Schema deviation from dispatch

The dispatch claimed no schema migration was needed for the new
'synthesis' part kind. Reality: message_parts.kind has an explicit
CHECK constraint (schema.sql:54) that would reject the new value. Added
a DROP CONSTRAINT IF EXISTS + DO $$ pg_constraint idempotency-guarded
re-add matching the CLAUDE.md migration pattern. The inline CREATE TABLE
constraint also updated so fresh installs land with the extended enum.

## User-abort marks synth-message failed

Deviation from review-time spec ("user-abort path does NOT mark the
message failed"). The outer abort handler in error-handler.ts operates
on the parent turn's assistantMessageId, not the new synth row that
runSynthesisPass created. Without explicit marking, the synth row would
sit in status='streaming' until the 5-min stale-streaming sweeper
(v1.13.1-cleanup-bundle), tripping the frontend's 60s no-token-activity
banner in the meantime — exactly the UX bug class the v1.13.1 sweeper
was added to handle. Marking failed on every catch path (including
user-abort) closes the gap. Cost: one extra DB write + one publish on
the rare user-abort-during-synth path.

## Race-safe synth-tool capture

tool-phase.ts uses synthEntries: Array<{tc, output, error?}> with
per-callback push under Promise.all. find() picks the first non-error
entry by call-order (toolCalls array index). Multiple synth-tools in
one batch are uncommon but handled deterministically.

## Roadmap rebase

Updated boocode_roadmap.md retrospective section + cleanup-order tracker
+ schema-changes summary to use the new vMAJOR.MINOR.PATCH-slug tag
names per the 2026-05-22 retag (CHANGELOG.md is the canonical record).
v1.13.15 listed as "this batch, tag pending"; a one-line follow-up
commit will remove that qualifier after the tag lands.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 20:08:47 +00:00
0fa46cd06c v1.13.12: skills audit + token-tracking fix + codecontext + cap50 + UI cleanups
Multi-topic batch. The big-ticket item is the skills audit; the rest are
smaller patches that compounded during the audit work.

## Skills audit (rules→recipes split)

Vendored all 26 skills from /home/samkintop/opt/skills/ into data/skills/
(the boocode-repo-local skill library — see docker-compose change below).
Audited via 5 parallel Claude Code agent-teams running the
mgechev/skills-best-practices 4-step protocol (Discovery → Logic → Edge
Case → self-Architecture-Refinement) per skill, ~2 min wall-clock vs the
~3.7-hour serial estimate.

Result: 14 skills surviving (renamed to gerund form, frontmatter matched),
11 deleted (duplicates, BooCode-irrelevant patterns, Claude-already-does-
natively), 1 migrated to BOOCHAT.md/BOOCODER.md as an always-true rule
(verification-before-completion). Each surviving skill had its description
refined to fix specific trigger gaps surfaced by the protocol — 4
real-bug findings landed (dead refs, stale tags, broken sub-file
references in the original vendored content).

Audit decisions documented in openspec/changes/v1.13.12-skills-audit/
audit-notes.md. Convention codified in BOOCHAT.md/BOOCODER.md "rules vs
recipes" sections — future workflow rules go to those files (100%
present), recipes stay in data/skills/ (~6% invoke rate in multi-turn
per the Codeminer42 measurement).

## Token tracking + stale-stream banner fix (same root cause)

ws-frames.ts IsoTimestamp was z.string().min(1) but postgres returns
timestamp columns as JS Date objects. Every message_complete /
session_updated / chat_updated frame was failing the v1.13.11 Zod gate
and being silently dropped. Symptoms: token tracking blank in the UI
(no usage frames landed); the 60s no-token-activity timer tripped the
stale-stream banner because the frontend's local message state never
saw status='streaming' flip to 'complete'.

Fix: z.preprocess(v => v instanceof Date ? v.toISOString() : v,
z.string().min(1)) applied to the IsoTimestamp primitive. Centralized,
no publisher changes, works identically server + web (the parity test
still passes).

## Codecontext .codecontextignore auto-install

services/codecontext_client.ts now copies the
codecontext/.codecontextignore.template into any project's root on the
first call to that project if no .codecontextignore exists. One file
written per project, idempotent (in-memory Set guard + access-check),
silent fallback on read-only project. Stops the upstream empty-source-
file parser crash on foreign projects' node_modules — previously
required manually copying the template per project.

## Tool-call budget cap 30 → 50

services/inference/budget.ts: BUDGET_READ_ONLY and BUDGET_NO_AGENT
bumped to 50 (from 30). BUDGET_NON_READ_ONLY stays at 10 (no write
tools landed yet). Real recon sessions were hitting 30 with ~3 turns
wasted on codecontext parse failures; legitimate need was ~27, and
Architect-class system overviews want deeper recon. Headroom of 20
absorbs failure-retry turns without changing the safety floor — the
doom-loop guard (3 identical calls → abort) catches the actual
failure mode this cap was guarding against.

v1.14 (Phase C outer agent loop) will supersede this via per-agent
agent.steps. Throwaway-ish patch but unblocks deeper recon today.

## UI cleanups

- ChatPane queued-message dropdown removed. Each queued message now
  has three buttons: edit (pop back into ChatInput via sendToChat
  event), force-send (was the dropdown's only useful action), and
  cancel. Default behavior (send when streaming completes) needs no
  UI — it's the implicit do-nothing path.
- ChatThroughput removed from desktop tab strip (ChatTabBar.tsx).
  Mobile tab switcher still shows it.

## Plumbing

- .gitignore: data/* + !data/AGENTS.md + !data/skills/ negation
  patterns so the vendored skill library + agent registry become
  git-tracked while session DB state stays out.
- docker-compose.yml: removed /opt/skills:/data/skills override
  mount. Skills now live in the boocode repo at data/skills/,
  auditable per-batch. The host-level /opt/skills/ is preserved
  untouched for any other tools that read from it.
- .codecontextignore at repo root: auto-installed when codecontext
  was first called against /opt/boocode itself; matches the template.
- CLAUDE.md: updated to document the v1.13.11 publishFrame wrapper +
  message_parts table + tool_cost_stats view + DB-integration test
  pattern + host-side smoke endpoint quirk. (Pre-existing in working
  tree before this batch; shipped here for completeness.)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 18:58:30 +00:00
bc376c878d v1.13.11-b: convert raw broker.publish call sites to typed publishFrame
Second half of the WebSocket-frame-typing batch. Phase A (8b568b3)
landed the schemas + frontend receive validation + publishFrame /
publishUserFrame wrappers. This commit converts the existing publish
call sites so every server-emitted WS frame now goes through Zod
validation at the broker boundary.

Conversion strategy: change once in the inference / skills adapters in
index.ts (so ctx.publish / ctx.publishUser propagate to publishFrame /
publishUserFrame for ALL ~50 inference + auto_name call sites in one
move), then bulk-replace the ~30 direct broker.publish* call sites in
the routes + compaction.

Files touched:
- index.ts: inference + skills route adapters now call publishFrame /
  publishUserFrame internally; raw broker.publishUser('default', ...)
  call in the stale-row sweeper also converted.
- routes/projects.ts (7 sites), routes/chats.ts (9 sites),
  routes/sessions.ts (8 sites): all broker.publishUser(...) → broker.
  publishUserFrame(...).
- services/compaction.ts (3 sites): 2 publishUser, 1 publish.

Real protocol drift surfaced by Zod, fixed in the same commit:

  services/compaction.ts:442 was publishing chat_status with status:
  'working' — the v1.12.1 chat_status widening (CLAUDE.md:55) dropped
  this enum value in favor of streaming|tool_running|waiting_for_input|
  idle|error. The compaction.ts site was missed during v1.12.1; the
  frame had been published with an unknown enum value ever since (the
  frontend useChatStatus quietly ignored it). Corrected to 'streaming'
  — compaction's LLM call has the same dot-state semantic as an
  inference turn. This is exactly the class of bug v1.13.11 exists to
  catch.

Schema relaxation: OpaqueObject (the bag type for nested entities like
Project / Chat / Session / WorkspacePane embedded in WS frames) was
z.object({}).passthrough(), which Zod outputs as {} & {[k:string]:
unknown}. The strict-typed entities don't have index signatures so
TypeScript rejected them at publishFrame call sites. Relaxed to
z.unknown() — runtime validation still accepts the value, dev-time
narrowing happens via the existing hand-maintained types. Trade-off:
frame-level drift detection stays sharp; nested-payload validation
goes to follow-up work as the brief intended.

Schema audit:
  grep -rn "broker\.publish(\|broker\.publishUser(" apps/server/src \
    --include="*.ts" | grep -v "broker.ts\|__tests__\|.bak"
  → 0 results. Every server publish goes through publishFrame /
  publishUserFrame. The remaining ctx.publish / ctx.publishUser sites
  in services/inference/* + services/auto_name.ts route through the
  index.ts adapter, which calls publishFrame internally.

Tests: 219/219 pass (unchanged from v1.13.11-a; the Phase B conversion
is mechanical and doesn't add test cases).

Smoke: clean container boot, no ws-frame-validation-failed entries
under normal traffic. Sidebar list refresh + agent picker open both
pass through useUserEvents without drops.

~70 LoC across 7 files. v1.13.11 closed.
2026-05-22 15:54:00 +00:00
8b568b36d3 v1.13.11-a: WS frame schemas + frontend receive validation
First half of the WebSocket-frame-typing batch (split per recon — total
scope was ~535 LoC, larger than the roadmap's ~300 estimate, so the
server-side publish-site conversion lands separately in v1.13.11-b).

Phase A scope:

(1) apps/server/src/types/ws-frames.ts (NEW) — Zod schemas for all 27
wire-format WS frame types. Discriminated union (WsFrameSchema) plus
KNOWN_FRAME_TYPES const for diagnostic lookup. UUIDs are z.string().
uuid(); model-emitted tool_call_id stays z.string().min(1) since OpenAI-
compatible APIs emit "call_<random>" not UUID. Per-kind payload narrowing
(tool args, message_parts payloads) intentionally stays z.unknown() —
frame-level drift detection is the goal; deep payload validation is
follow-up work.

(2) apps/web/src/api/ws-frames.ts (NEW) — byte-identical mirror of the
authoritative server file. No path alias from web→server in the existing
tsconfig setup; sync-by-hand was chosen over a new packages/shared/ dir.
A ws-frames.test.ts test asserts the two files match.

(3) apps/server/src/services/broker.ts — adds publishFrame() and
publishUserFrame() methods to the Broker interface. Both validate via
WsFrameSchema and fail-closed: log + drop on invalid. createBroker now
accepts an optional FastifyBaseLogger so validation failures land in
the pino stream (with console.error fallback for unit tests). The
existing publish() / publishUser() raw methods stay legal — they get
converted to the typed variants in v1.13.11-b.

(4) apps/web/src/hooks/useSessionStream.ts + useUserEvents.ts — wrap
ws.onmessage with WsFrameSchema.safeParse. Fail-closed: invalid frames
log + return without dispatching. Hand-maintained WsFrame and
SessionEvent types stay in place; one cast bridges Zod-typed → narrowed
shape (Zod uses OpaqueObject for nested Message[] / WorkspacePane[] etc.,
which are dev-time-narrowed via the existing hand-maintained types).

(5) apps/web/package.json — adds zod ^3.23.8 as a direct dep. Was a
transitive dep via ai-sdk / postgres; promotion makes the import legal.

(6) Tests: 15 new in ws-frames.test.ts covering happy-path per major
frame type, drift-catchers (unknown type, invalid enum, non-UUID, negative
tokens), parts-authoritative read variants, the mirror-file diff check,
and four broker fail-closed scenarios. 219/219 server tests pass (was
204; +15 new).

Two recon corrections to the dispatch brief, both flagged before
implementation:

- No 'parts_appended' frame exists. The brief assumed one; the codebase
  reads parts via the messages_with_parts view after message_complete
  triggers a refetch. MessagePartSchema is therefore unused this batch.
- No 'tool_running' frame exists. The brief listed it as standalone; it
  is in fact a 'chat_status' variant ({ status: 'tool_running' }), already
  covered by ChatStatusFrame.

Smoke: clean container boot, no validation errors in the server log. Real
production frames pass validation (the schemas were derived from the
existing hand-maintained types in api/types.ts and sessionEvents.ts).

v1.13.11-b will follow immediately: convert all ~85 raw broker.publish /
ctx.publish call sites across 11 server files to publishFrame /
publishUserFrame. Mechanical edit; the wiring done here means the diff
in -b is just the call-site swaps.

~310 LoC across 9 files (4 new + 5 modified).
2026-05-22 15:48:32 +00:00
34cbecf975 v1.13.15-tools: tiered tool loading via BOOCODE_TOOLS env var
Pattern lift from eyaltoledano/claude-task-master (MIT + Commons Clause
— pattern only, no code lift). Adds BOOCODE_TOOLS env var with three
tiers:

- core (4 tools): view_file, list_dir, grep, find_files. ~2k token
  schema cost.
- standard (15 tools): core + web_search, web_fetch, git_status, all
  8 codecontext_* tools. ~10k token schema cost.
- all (default; current behavior): every tool in ALL_TOOLS (20). ~21k
  token schema cost.

The env var is a CEILING — narrows agent whitelists, never expands.
Default behavior unchanged when var is unset. resolveToolTier is
case-insensitive and falls back to 'all' on unknown values.

CORE_TOOL_NAMES + STANDARD_TOOL_NAMES validated at module load against
TOOLS_BY_NAME via two top-level for-loops that throw on the first
missing name. Module fails to import if a tier references a tool that
doesn't exist in the registry — catches typos and stale tier
definitions at boot rather than silently filtering valid tools out of
agent whitelists.

Wiring: agents.ts parseAgentBlock now reads BOOCODE_TOOLS from
process.env per parse, intersects with the agent's declared frontmatter
tools (or DEFAULT_TOOLS when frontmatter omits the field). Per-parse
read is fine — agents are re-parsed on the existing 60s cache TTL.

Tests: tools.test.ts grows from 1 to 10 tests. Covers resolveToolTier
across tiers/case/unknown values + the CORE-subset-of-STANDARD invariant
+ TOOLS_BY_NAME existence for both tier sets. 204/204 pass (was 195;
+9 new).

Deviation from the brief: the codecontext tools in the actual registry
have NO codecontext_* prefix (the brief's STANDARD list assumed it).
Used the actual names (get_codebase_overview, search_symbols, etc.).
Module-load validation would have failed boot with the prefixed names.

Smoke: with BOOCODE_TOOLS unset, agents return their full 12-tool
whitelists. With BOOCODE_TOOLS=core in .env + container restart, the
same agents narrow to 4 tools (find_files, grep, list_dir, view_file)
— intersection of declared whitelist ∩ core tier. Reverted after
confirmation.

CLAUDE.md updated with BOOCODE_TOOLS in the Environment section's
Optional list. .env.example gained a commented BOOCODE_TOOLS=all line
with the per-tier token-cost table.

~110 LoC across 5 files (4 modified + 1 test expansion). Under the
brief's ~30 LoC estimate for code; the test suite expansion drove
most of the growth.
2026-05-22 14:59:01 +00:00
5a3f357ce9 v1.13.15-openspec: reformat batch docs to OpenSpec directory structure
Adopt Fission-AI/OpenSpec's openspec/changes/<change-name>/{proposal,
specs,design,tasks}.md shape for BooCode's own batch docs. Zero-dep
documentation reformat; replaces ad-hoc boocode_batchN.md /
handoff_vN.N.N.md convention.

Existing batch docs moved into openspec/changes/archived/ via git mv
(preserves history):
- boocode_batch10.md
- handoff_v1.13.8_prefix_verify.md
- handoff_v1.13.10_per_tool_cost.md

Pre-v1.13.15 docs were NOT split into proposal/tasks/design files. The
work was already shipped; the originals are preserved as archived
snapshots. New v1.13.15+ batches land directly in
openspec/changes/<slug>/proposal.md (+ tasks.md, + design.md when
applicable) per the convention documented in openspec/README.md.

CLAUDE.md gained a one-line pointer to the convention (workflow
section). File grew from 153 → 154 lines, 27,682 → 27,925 chars; both
remain well under the AgentLint hard caps.

specs/ directory is reserved for future OpenSpec CLI adoption (v1.14+).
No CLI dep added in this batch — directory structure only. If/when the
full OpenSpec lifecycle is adopted, that lands as a separate batch.
2026-05-22 14:54:17 +00:00
fc11e8dc91 v1.13.15-agentlint: instruction-file audit against AgentLint 31-check standard
Manual audit pass against 0xmariowu/AgentLint's evidence-backed checks
(MIT, drawn from 265 versions of Anthropic's internal Claude Code
system prompt).

Findings and fixes:
- Identity sections ("You are the assistant running inside ...") removed
  from BOOCHAT.md (line 3) and BOOCODER.md (line 5). The model already
  knows where it's running; the openers were emphatic decoration.
- CLAUDE.local.md added to .gitignore (.env was already covered).
  Claude Code's Glob tool ignores .gitignore by default, which means
  any local override file was otherwise readable by any agent walking
  the workspace.
- CLAUDE.md unchanged — already passes all 10 checks. Emphasis density
  0.58/1000 words (under Anthropic's 1.4/1000 endpoint); two IMPORTANT/
  MUST references are load-bearing (tsc-noEmit footgun, v1.13.7
  includeUsage invariant); zero identity sections; zero --no-verify
  references; 27,682 chars (under the 40,000-char silent-drop limit).
  Line count (153) is over the 60-120 target band, but the brief
  explicitly forbids structural rewrites in the audit pass.

Targets not in scope:
- /opt/boocode/AGENTS.md does not exist in this repo (removed in v1.12,
  per CLAUDE.md:152). The global agent registry lives at /data/AGENTS.md
  (bind-mounted from outside the repo); can't be touched by this batch.
- No .github/workflows/ directory — SHA-pin audit (step 8) skipped.

Cumulative effect: model spends fewer tokens parsing instruction-file
ceremony in BOOCHAT/BOOCODER and receives sharper priority signal per
Anthropic's measured-evolution data. Zero code changes.
2026-05-22 14:52:37 +00:00
9ce638c916 v1.13.10: per-tool token cost accounting (rolling 100-call view)
Surfaces per-tool prompt/completion-token rolling averages in
AgentPicker for at-a-glance agent-cost hints. Implementation is a
SQL view on top of messages_with_parts plus a read endpoint and
AgentPicker tooltip extension. No new write site; all source data
already lands via the existing tool-phase.ts:94-95 / error-handler.ts:
109-110 / sentinel-summaries.ts UPDATEs that v1.13.7's includeUsage:
true fix made non-NULL.

(1) schema.sql — new tool_cost_stats view. Window-functions over
messages_with_parts.tool_calls with LATERAL jsonb_array_elements.
Attribution: equal split — multi-tool turn divides tokens N-ways;
the 100-call rolling mean absorbs split noise. Filters: status=
'complete' + metadata.kind NOT IN ('cap_hit','doom_loop') exclude
failed turns and sentinels respectively; tool_calls IS NOT NULL is
defense-in-depth since sentinels are role='system' rows. CREATE OR
REPLACE means schema apply is idempotent.

(2) routes/tools.ts NEW + index.ts wire-in. GET /api/tools/cost_stats
returns { stats: ToolCostStat[] } with mean_prompt_tokens / mean_
completion_tokens computed at read time (sum / n_calls). Sorted by
tool_name ASC. No pagination — ≤30 tools.

(3) __tests__/tool_cost_stats.test.ts NEW — 7 integration tests
keyed off DATABASE_URL env var. Tests skip gracefully when unset
(no-DB default). beforeAll applies the schema via sql.unsafe(read
FileSync(schema.sql)) for self-contained runs. Helper insertAssistant
Turn shared across cases. Covers: empty state, single-tool attribution,
multi-tool equal split, 100-call FIFO window, NULL-tokens exclusion,
parts-authoritative read via messages_with_parts, failed/sentinel
exclusion.

(4) web/api/types.ts + client.ts — ToolCostStat interface + api.tools.
costStats() method binding.

(5) AgentPicker.tsx — fetch costStats on mount, compute per-agent
sum-of-means across whitelisted tools, render muted cost line below
description: "~5.2k prompt / 280 completion · 6/8 tools · last call
3h ago". Skips line entirely when no tool history; preserves existing
native title= for layout backward-compat. formatK/formatAgo colocated.

Tests: 202/202 pass (195 prior + 7 new view-integration). Server +
web tsc clean.

Smoke: schema applied cleanly; GET /api/tools/cost_stats returns
canonical JSON; view + endpoint agree. Single-row result expected
given the v1.13.1-A → v1.13.7 NULL latent regression window; new
traffic populates organically.

Roadmap row at boocode_roadmap.md:114 plus schema row at :474 both
match. View vs table decision documented in handoff_v1.13.10_per_
tool_cost.md (rollback-safe, microsecond-fast at BooCode scale).

~270 LoC across 8 files (5 modified + 3 new).
2026-05-22 14:42:09 +00:00
8126d78b34 docs: capture v1.13.7-v1.13.9 invariants in CLAUDE.md
Five additions surfacing session-discovered constraints future Claude
sessions need:
- AI SDK v6 includeUsage:true requirement (avoids re-introducing the
  v1.13.1-A→v1.13.7 NULL-tokens regression)
- \n text-delta trim guards in MessageList/MessageBubble + payload.ts
  failed/empty-assistant skip rules (avoid undoing v1.13.7)
- 0.85 × ctx_max overflow formula (v1.13.9) replacing the stale
  ctx_max - 20k line
- New services/system-prompt.ts bullet documenting the v1.13.8
  fingerprint instrumentation surface
- New services/inference/budget.ts bullet with current BUDGET_NO_AGENT=30
  and read-only-tools rationale
2026-05-22 14:07:11 +00:00
b06a4a8e55 v1.13.9: compaction overflow trigger — 0.85 × ctx_max early trigger
Opencode pattern (session/overflow.ts): fire compaction at 85% of
ctx_max, replacing the v1.11.0-era `ctx_max - 20_000` formula.

Old formula: usable = ctx_max - 20_000
  - ctx=262144 → trigger at 242144 (92.4%) — only 7.6% headroom
  - ctx=100000 → trigger at  80000 (80.0%)
  - ctx= 32000 → trigger at  12000 (37.5%) — over-eager
  - ctx<=20000 → trigger at      0 — never fires

New formula: usable = floor(0.85 * ctx_max)
  - ctx=262144 → trigger at 222822 (85.0%) — 15% headroom for summarizer
  - ctx=100000 → trigger at  85000 (85.0%)
  - ctx= 32000 → trigger at  27200 (85.0%)
  - ctx=  8192 → trigger at   6963 (85.0%)

Ratio gives consistent headroom at any context scale. The qwen3.6
daily driver gets ~19k tokens more breathing room before overflow;
small-ctx models no longer degenerate to never-triggering.

usable() is the only consumer of COMPACTION_BUFFER → constant deleted.
New EARLY_TRIGGER_RATIO constant takes its place.

isOverflow() and the maybeFlagForCompaction() call site at
payload.ts:184 are unchanged — formula swap is internal to compaction.ts.
payload.ts comment touched only to drop the stale COMPACTION_BUFFER
reference (PRUNE_TRIGGER_TOKENS stays at 20k as the prune-freed
threshold; independent of the overflow formula).

Tests: 4 new usable() corner cases (262k/100k/8k/zero+negative), plus
5 isOverflow() numbers shifted to match the 85k budget at ctx=100k.
195/195 server tests pass (was 194).

Smoke: ratio math verified by unit tests at all four corners. Live
cap-hit verification deferred — requires accumulating >222k tokens
in a session under qwen3.6-35b-a3b-mxfp4 (was >242k pre-fix); will
surface organically in extended use.
2026-05-22 13:59:14 +00:00
a0c8d212cb v1.13.8: system-prompt prefix stability verify-and-measure
Recon during planning disproved the original v1.13.7 (DB-cache) premise:
buildSystemPrompt already runs over inputs mtime-cached at the file layer
(BOOCHAT.md in system-prompt.ts:25, AGENTS.md global+per-project in
agents.ts:245), and DB scalars are byte-stable until edited. The output
is microsecond pure-string concat with no I/O. Skills aren't in the
prefix; tools live in a separate request body field alpha-sorted by
v1.13.3.

This batch closes the verification gap with instrumentation, not
implementation:

- system-prompt.ts: buildSystemPromptWithFingerprint canonical impl
  computes SHA-256 over the assembled prefix, runs a per-session
  Map<sessionId, lastHash> observer, emits PrefixFingerprint per call
  and PrefixDrift (with field-level changed_inputs) on hash change.
  buildSystemPrompt is now a thin shim returning .prompt.
- agents.ts: getAgentsMtimes accessor — cache-read only, no I/O.
- payload.ts: buildMessagesPayload takes optional log argument; when
  passed, emits prefix-fingerprint (info) + prefix-drift (warn).
- turn.ts + sentinel-summaries.ts: pass ctx.log at 3 production call
  sites; sentinel summaries log too so any drift across cap-hit /
  doom-loop paths surfaces.
- system-prompt.test.ts: 4 new tests (byte-identical, no-drift-on-
  stable, drift-fires-with-changed-inputs, cross-session-no-drift).

194/194 tests pass (was 190).

Smoke: 5 messages in a fresh session produced 7 prefix-fingerprint
logs (extras from buildMessagesPayload being called from sentinel
summary paths), all with identical prefix_hash and prefix_length=2907,
zero prefix-drift. Prefix is byte-stable in steady-state.

Decision: original system_prompt_cache DB table from the roadmap is
permanently dropped. The v1.12.0 mtime caches at the input layer plus
alpha tool ordering at the request body (v1.13.3) already address the
load-bearing cache-stability surfaces. Instrumentation stays so the
claim can be re-verified at any time.
2026-05-22 13:42:18 +00:00
0ce6115976 docs: renumber v1.13.8 to verify-and-measure, drop system_prompt_cache table, add v1.13.8 dispatch brief 2026-05-22 13:24:29 +00:00
ff29b48e3a v1.13.7: stability bundle — usage capture + payload/UI sanitization
Five fixes for latent regressions surfaced during the v1.13.x.cosmetic
revert investigation. None alter schema or compaction; all cleanup
against the v1.13.1-A AI SDK migration's hidden surface.

(1) provider.ts — includeUsage: true on createOpenAICompatible.
@ai-sdk/openai-compatible defaults this false, omitting
stream_options.include_usage from the request body; llama-swap never
emitted the usage block, so result.usage.inputTokens/outputTokens
resolved undefined and tokens_used / ctx_used landed NULL in every
assistant row since v1.13.1-A. No historical backfill.

(2) MessageList.tsx — hasText = m.content.trim().length > 0.
AI SDK v6 streaming occasionally emits a leading "\n" text-delta on
tool-call-only turns; the literal newline passed length > 0 and
rendered an empty bubble + ActionRow between every tool call. Trim
catches it without changing semantics for genuine content.

(3) MessageBubble.tsx — same trim on hasContent for the no-tool-calls
path. Defensive symmetry with MessageList.flatten.

(4) payload.ts — buildMessagesPayload skips assistant rows with
status='failed' AND assistant rows with status='complete' + empty
content + no tool_calls. Without this, a trailing empty/failed
assistant + the next attempt's placeholder produced "Cannot have 2
or more assistant messages at the end of the list" rejections from
the OpenAI-compatible upstream after cap-hit + Continue.

(5) budget.ts — BUDGET_NO_AGENT 15 → 30. Every tool in ALL_TOOLS is
read-only today; the 15-cap was forward-looking for write tools that
haven't landed. No-agent mode now matches BUDGET_READ_ONLY.

47 LoC across 5 files. 190/190 server tests pass.

Verified live: new assistant turns populate StatsLine token data;
single-tool-call turns no longer render the stray empty-bubble +
ActionRow between tool calls; Continue after cap-hit no longer hits
the trailing-assistant API rejection.
2026-05-22 13:24:19 +00:00
81d837c04e v1.13.6: compaction head-assembly audit + reasoning fix
Audit traced compaction's summary path post-v1.13.1-B read flip:
- Q1: reads from messages_with_parts (view) — clean
- Q2: parts shape correctly threaded through buildHeadPayload — clean
- Q3: reasoning omitted from summary input — FIX NEEDED

v1.13.1-C wired reasoning end-to-end into inference/payload.ts but
missed this read site. Summarizer model couldn't see the reasoning
trail for tool-bearing turns, quietly degrading summary quality for
reasoning-channel models (qwen3.6).

Fix:
- CompactionMessage extended with reasoning_parts field
- SELECT pulls reasoning_parts from messages_with_parts
- buildHeadPayload (now exported for tests) prefixes assistant content
  with <reasoning>...</reasoning>\n\n<content>... when reasoning is
  present; standalone <reasoning>...</reasoning> for tool-call-only
  turns; omits the tag when reasoning is null or empty

4 new render branch tests (190 total).

Smoke deferred: forcing real compaction requires either threshold
pollution or building up a >40k-token chat with reasoning_parts.
Render branches are unit-covered; integration would only re-prove
structural correctness.
2026-05-22 08:18:47 +00:00
f8fc5db929 v1.13.5: opencode truncate.ts port — full tool output retrievable via opaque id
- New services/truncate.ts. Tmpfs storage at /tmp/boocode-truncations/
  (BOOCODE_TRUNCATION_DIR env var overrides for tests). 12-char base32
  opaque ids (~60 bits entropy, "tr_<id>"). Three exports: storeTruncation,
  readTruncation, truncateIfNeeded (wrap-or-passthrough helper).
  cleanupTruncations does TTL-pass (7 days) + orphan-reap (parts query on
  payload->'output'->>'outputPath') in one shot.
- Wired four tools through truncateIfNeeded: view_file (raw full file),
  list_dir (full filtered+secret-filtered entries serialized one-per-line),
  web_fetch (textRaw pre-slice), codecontext_client (body.result pre-slice).
  Each returns the existing sliced view plus an optional outputPath field
  when truncation fires.
- New view_truncated_output ToolDef. Resolves opaque id → on-disk content
  internally; model never sees the truncation dir. Same start_line /
  end_line slicing semantics as view_file. Registered in ALL_TOOLS (alpha
  sort places it after view_file automatically) and READ_ONLY_TOOL_NAMES.
- cleanupTruncations piggybacks on the v1.13.3 stuck-row sweeper's 60s
  setInterval. No-op when truncation dir is empty.

Not wired (TODO follow-up): grep and find_files. file_ops returns post-cap
results to the tool execute path, so the "full content" isn't recoverable
without a refactor of fileOps.grep / fileOps.findFiles to expose the
uncapped result. web_search is silent-slice (no truncated flag); outside
scope. Five sites of seven covered; the remaining two are the only ones
needing a file_ops change.

Tests: 7 new in truncate.test.ts (roundtrip, unknown id, malformed id,
truncateIfNeeded false/true/over-cap/storage-failure paths). 186 total
(was 179). cleanupTruncations file-system half implicitly via TTL pass;
orphan-reap branch covered by the live container smoke.

Smoke verified end-to-end against the live container:
- view_file with start_line=1, end_line=3 on CLAUDE.md → tool_result part
  carried outputPath "tr_cdpn1o04k6ma" + truncated=true.
- /tmp/boocode-truncations/tr_cdpn1o04k6ma exists, 15876 bytes, mode 0o600,
  parent dir mode 0o700.
- Follow-up view_truncated_output(id, start_line=50, end_line=55) returned
  the actual lines 50-55 of CLAUDE.md (the 808notes/BooCode bullets).
- ALL_TOOLS count=20 (was 19); alpha sort places view_truncated_output
  between view_file and watch_changes.

Closes a v1.12 catalog row that was scoped but deferred. The v1.13 parts
table made outputPath ride on the existing tool_result payload with no
schema change beyond the storage helper itself.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 07:55:55 +00:00
ec8593cf77 v1.13.4: two-tier compaction prune — opencode pattern half-shipped in v1.11.0
- message_parts.hidden_at timestamptz column (NULL by default) with a
  partial index on (message_id) WHERE hidden_at IS NULL for the common
  visible-parts filter.
- messages_with_parts view changed from COALESCE(parts, legacy) to
  CASE WHEN EXISTS(any parts of kind) THEN visible-parts ELSE legacy.
  COALESCE would have leaked hidden parts back via the legacy fallback
  when every part was pruned (smoke caught it pre-commit). The CASE
  distinguishes "no parts at all → fall back to legacy column for
  pre-v1.13.0 history" from "all parts hidden → return null/empty so
  the row drops out of the model payload" exactly.
- prune.ts: scans tool_result parts newest-first, protects the last 40k
  tokens (PROTECTED_TOKENS), marks older candidates hidden when their
  combined estimate clears 20k (PRUNE_TRIGGER_TOKENS — equal to
  COMPACTION_BUFFER from v1.11.0, so a successful prune is exactly the
  budget the summary path would have freed). Stops at chats.tail_start_id
  so it doesn't double-erase across the last summary boundary. Pure
  decision helper selectPruneTargets exported separately for unit tests.
- Wired into maybeFlagForCompaction: prune runs synchronously when
  overflow is detected; if it freed >= PRUNE_TRIGGER_TOKENS, the
  needs_compaction flag is NOT set and the (expensive) summary inference
  call is skipped this turn. The next turn's overflow check re-evaluates
  from scratch.
- 6 new unit tests in prune.test.ts cover: empty input, protection-only
  (no candidates), candidates below trigger, candidates above trigger,
  candidates straddling a summary boundary, exactly-protection-tokens.
  179 tests total (was 173).

Smoke verified post-rebuild:
- \\d message_parts shows hidden_at + partial index.
- View definition shows AND p.hidden_at IS NULL filters on all three
  subselects.
- Synthetic hide-then-restore confirmed the view drops the tool_result
  jsonb to null when its only part is hidden, and restores when un-hidden.
- EXPLAIN ANALYZE on the 42-message stress chat: 0.325ms (faster than
  v1.13.1-B's 1.018ms — EXISTS short-circuits cleanly for the common
  no-parts case).
- Normal turn (plain text prompt) completes unaffected.

Closes a v1.11.0 design item that was scoped but never implemented. With
v1.13's parts table the prune is dramatically cheaper to write — pre-parts
it would have meant editing JSON blobs in-place; now it's a hidden_at
flag and a view subselect.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 07:02:17 +00:00
165 changed files with 16561 additions and 698 deletions

33
.codecontextignore Normal file
View File

@@ -0,0 +1,33 @@
# .codecontextignore — paths codecontext skips during analysis
# Copy to your project root and customize. Same syntax as .gitignore.
# Dependencies / vendored code
node_modules/
vendor/
.venv/
venv/
__pycache__/
target/
# Build artifacts
dist/
build/
out/
.next/
.nuxt/
.svelte-kit/
# IDE / tooling
.opencode/
.vscode/
.idea/
# Test artifacts / coverage
coverage/
.nyc_output/
.pytest_cache/
# Lock files (rarely have meaningful symbols)
package-lock.json
yarn.lock
pnpm-lock.yaml

View File

@@ -10,3 +10,12 @@ POSTGRES_PASSWORD=CHANGE_ME
# Internal Tailscale address that bypasses Authelia. Override if you
# point BooCode at a different SearXNG instance.
SEARXNG_URL=http://100.114.205.53:8888
# v1.13.15-tools: BOOCODE_TOOLS narrows the tool whitelist sent to the LLM.
# Unset (default) → all tools (~21k schema). Useful primarily for single-purpose
# sessions where the model only needs read-only filesystem access.
#
# core → view_file, list_dir, grep, find_files (~2k)
# standard → core + web_*, git_status, all 8 codecontext_* tools (~10k)
# all → every tool in ALL_TOOLS (~21k)
# BOOCODE_TOOLS=all

5
.gitignore vendored
View File

@@ -1,9 +1,12 @@
node_modules
dist
.env
CLAUDE.local.md
*.log
.DS_Store
.vite
coverage
secrets/
data/
data/*
!data/AGENTS.md
!data/skills/

View File

@@ -1,7 +1,5 @@
# BooChat
You are the assistant running inside BooChat — a self-hosted developer chat app.
## Capabilities
- Read-only file tools: `view_file`, `list_dir`, `grep`, `find_files`
@@ -28,6 +26,18 @@ You are the assistant running inside BooChat — a self-hosted developer chat ap
- Cite file paths + line numbers for any claim about the codebase
- When uncertain about scope or intent, surface options via `ask_user_input` rather than guessing
- Prefer codecontext (`search_symbols`, `get_symbol_info`, `get_dependencies`) over `grep` for symbol-level questions. Fall back to `grep` / `view_file` when codecontext returns degraded or empty results — that signals an unsupported language or parse failure.
- Verify before reporting work complete: run the relevant test/build/smoke command and confirm output matches the claim. Evidence first, assertion second.
## Output format
- Stay in Markdown by default for every reply, short or long.
- Switch to a self-contained `<!DOCTYPE html>...</html>` artifact only when the user explicitly asks (e.g. "render this as HTML", "make me a dashboard", "build an interactive diagram"). Detection is opportunistic — the BooChat backend tags the assistant message as an HTML artifact, opens it in a sandboxed pane, and offers Download. Do not emit HTML unprompted; long Markdown is the right answer for most explanatory output.
- When asked to produce HTML, avoid generic AI aesthetics: no excessive centered layouts, no purple gradients, no uniform rounded corners, no Inter font. Prefer interactive controls (sliders / knobs / SVG / side-by-side diffs) over passive prose-in-HTML. Pattern reference: claude.com/blog/using-claude-code-the-unreasonable-effectiveness-of-html (Thariq Shihipar, May 2026).
- The HTML artifact is rendered in a sandboxed iframe with `connect-src 'none'``fetch()`, WebSockets, and tracking pixels do not work. All logic must be client-side.
## Convention: rules vs recipes
Always-true rules (process discipline, refusals, behavior contracts) live here in `BOOCHAT.md` — and in `BOOCODER.md` / `CLAUDE.md` per their scopes — where they are 100% present in every turn. On-demand recipes (specific procedures, scaffolds, checklists) live in `/data/skills/` and invoke roughly 6% of the time in clean multi-turn flow (Codeminer42 measurement, 2026). Don't file workflow rules as skills — they silently misfire. See Anthropic agent-skills best-practices (platform.claude.com/docs/en/agents-and-tools/agent-skills/best-practices) for the canonical conventions.
## Known limitations

View File

@@ -2,8 +2,6 @@
> (Stub. v2.0 implementation pending. This file documents the intended contract.)
You are the assistant running inside BooCoder — the write-capable companion to BooChat.
## Capabilities
- Everything in `BOOCHAT.md`
@@ -22,3 +20,8 @@ You are the assistant running inside BooCoder — the write-capable companion to
- Show a diff preview before any write
- Group related edits into a single `/apply` batch
- If a tool fails, surface the error verbatim — don't paper over it
- Verify before reporting work complete: run the relevant test/build/smoke and confirm output matches the claim. Evidence first, assertion second.
## Convention: rules vs recipes
Always-true rules live here, in `BOOCHAT.md`, and in `CLAUDE.md` (100% present each turn). On-demand recipes live in `/data/skills/` (roughly 6% invoke rate in multi-turn per Codeminer42, 2026). Don't file workflow rules as skills — they misfire. See Anthropic agent-skills best-practices (platform.claude.com/docs/en/agents-and-tools/agent-skills/best-practices).

179
CHANGELOG.md Normal file
View File

@@ -0,0 +1,179 @@
# Changelog
All notable changes per release tag. Most recent on top, ordered by tag creation date (which matches the git history). Tag names follow `vMAJOR.MINOR.PATCH-slug` — the slug describes what shipped, so the tag name alone is enough to recall the batch.
## v1.13.19-html-artifact-panes — 2026-05-23
Pane-based artifact viewer with on-request HTML support. Every assistant message gets an "Open in pane" icon button (`PanelRightOpen`, mobile 44px tap-target) in `MessageBubble`'s ActionRow; click opens the message in the workspace splitter as either a Markdown pane (Copy raw source + Download `.md`) or an HTML pane (Download `.html` only, no Copy). The HTML path triggers when the model emits a self-contained `<!DOCTYPE html>` or fenced ` ```html` artifact (opt-in only — `BOOCHAT.md` rule says Markdown is default at every length; HTML only on explicit user request like "render this as HTML"). Backend detection in `finalizeCompletion` (`error-handler.ts`) writes a new `message_parts.kind='html_artifact'` row with payload `{html_content, char_count, title}` (`<title>` → first `<h1>` → first 80 chars of inner text). Schema CHECK extended via the v1.13.13 drop-and-re-add pattern. 1MB cap is graceful — over-cap artifacts skip the part write and plain content lands; decision factored into a pure `decideHtmlArtifactWrite` helper so the warn-and-skip branch is unit-testable without mocking the full InferenceContext. Pane state is reference-only (`{chat_id, message_id, title}`) — content is fetched on mount, keeping `sessions.workspace_panes` jsonb small and avoiding 1MB blobs riding the `session_workspace_updated` WS frame. New `services/artifacts.ts` ships slug derivation (Markdown: first `#` heading → first 6 words; HTML: `<title>``<h1>` → inner text) and write helpers that realpath the artifacts directory after `mkdir` to close a symlink-escape gap (`assertArtifactsDirSafe`). `routes/artifacts.ts` exposes POST `/api/chats/:id/messages/:msg_id/artifacts/download?fmt=md|html` (writes to `<projectRoot>/.boocode/artifacts/<slug>-<ts>.<ext>`) plus GET `/api/projects/:project_id/artifacts/:filename` with `Content-Disposition: attachment`, `X-Content-Type-Options: nosniff`, and `Content-Security-Policy: sandbox` defense-in-depth on LLM-served HTML. iframe sandbox locks to `allow-scripts allow-clipboard-write allow-downloads` with no `allow-same-origin` and uses `srcDoc` (not `src`) for opaque-origin isolation. Frontend extracts `MarkdownRenderer.tsx` from `MessageBubble`'s inline `MarkdownBody` for reuse; `MarkdownArtifactPane.tsx` / `HtmlArtifactPane.tsx` render with loading + error states. 404-vs-real-error discrimination in `openInPane`: a real network/500 failure toasts and bails instead of silently masquerading as a Markdown pane. 31 new server unit tests (slug derivation, detection positive/negative, write helpers, symlink-escape, 1MB cap, real-symlink filesystem test); 332/332 server tests passing; `tsc -p apps/web/tsconfig.app.json --noEmit` clean; `pnpm -C apps/web build` green. Smoke deferred to first deploy.
## v1.13.18-codecontext-file-path — 2026-05-22
Fix: four codecontext wrappers (`get_file_analysis`, `get_symbol_info`, `get_dependencies`, `get_semantic_neighborhoods`) forwarded `file_path` to the sidecar unchanged, but the sidecar's index is keyed on absolute paths — every relative path from the model returned "File not found in graph" (three back-to-back failures in one chat at 17:56 UTC, ~48 s of wasted tool budget). New `resolveProjectPath` helper in `codecontext_client.ts:64-89` realpath-resolves the candidate, applies the same escape check as the existing `target_dir` resolver (matching the error template byte-for-byte except the field name), and falls through with the normalised absolute on ENOENT so the sidecar issues its own self-correctable "File not found" error. Wired into `callCodecontext` once at the args-spread site — all four wrappers benefit without per-wrapper edits. `.trim()` added to all four `file_path` Zod schemas to absorb trailing newlines from model output. Adversarial review caught a P2 escape-bypass: an absolute path with `..` (e.g. `<projectRoot>/../etc/passwd`) that ENOENTs at realpath would slip through the literal prefix-check, fixed by `resolve()`-normalising the absolute branch too. 9 new test cases in `codecontext_client.test.ts` (7 spec scenarios + symlink-out-of-root + absolute-with-`..` ENOENT) plus a 1-line update in `codecontext_tools.test.ts` asserting the new resolved-absolute contract. Pairs with `v1.13.17-cross-repo-reads` — both harden path traversal, but v1.13.18 stays inside the project root while v1.13.17 widens access outside it.
## v1.13.17-cross-repo-reads — 2026-05-22
On-demand read access to paths outside the session's primary project root. Closes the dead-end where `pathGuard` rejected every cross-repo read with no recovery path. New `request_read_access(path, reason)` tool emits an `ask_user_input`-style pause; user picks Allow/Deny via inline chips in `RequestReadAccessCard.tsx`; on Allow, the new `POST /api/chats/:id/grant_read_access` endpoint re-resolves the grant root and appends to `sessions.allowed_read_paths` (new `TEXT[]` column, default empty). Grant unit per design D1 = nearest registered `projects.path` ancestor → else nearest repo-shaped ancestor (`.git/` / `package.json` / `go.mod` / `Cargo.toml`) under `PROJECT_ROOT_WHITELIST` → else refuse without prompting. `pathGuard` extended with an optional `extraRoots` argument threaded from `session.allowed_read_paths` through `executeToolCall` to the four filesystem tools (view_file, list_dir, grep, find_files); `view_file` re-anchors the secret-guard check on `basename(real)` whenever the path resolved via a grant root so `.env` / `id_rsa*` deny still fires across grants. `grant_resolver.ts`'s ancestor walk checks the whitelist invariant on every iteration (not just final parent) so a symlinked input can't escape mid-walk. PATCH `/api/sessions/:id` exposes `allowed_read_paths` only for revocation: zod refines paths to absolute + no traversal markers, and a runtime subset guard (`findUnauthorizedAdditions`) rejects any entry not already present in the row, so a malicious `curl -X PATCH -d '{"allowed_read_paths":["/etc"]}'` 400s instead of bypassing the grant flow. Settings pane gains a per-session revoke list; archiving the session clears grants implicitly. 11 grant_resolver tests pin the symlink-escape-mid-walk guard (Sam's checkpoint-1 ask) and the nearest-project disambiguation; 8 path_guard tests cover extraRoots traversal; 8 sessions PATCH tests cover the subset guard including the `/etc` bypass attempt. Pairs with `v1.13.16-xml-parser` (model now both self-recovers from a wrong tool name AND from a refused path).
## v1.13.16-xml-parser — 2026-05-22
Two-part fix for the model-emitted XML drift the v1.13.15 investigation surfaced. **Parser extension:** `xml-parser.ts` now recognizes the Anthropic `<invoke name="…"><parameter name="…">…</parameter></invoke>` shape alongside the existing Qwen/Hermes `<tool_call><function=…>…</function></tool_call>` shape. qwen3.6-35b-a3b-mxfp4 drifts to the Anthropic format when prompted as an Architect-style agent (Claude Code documentation in its pre-training corpus). Both formats route through the same synthetic-id `xml_call_${idx}` ToolCall path. The existing Qwen parser was tightened to tolerate whitespace around `=` (`<function = name>` shape) so a stray space doesn't get absorbed into the function name. **Unknown-tool recovery hint:** new `tool-suggestions.ts` exports `levenshtein()` + `suggestToolName()` + `formatUnknownToolError()`. When the dispatcher (`tool-phase.ts:executeToolCall`) receives an unknown tool name, the error returned to the model includes a "Did you mean: X?" hint based on Levenshtein distance ≤3 or substring match against `Object.keys(TOOLS_BY_NAME)`. Targets the qwen3.6 drift to `read_file` → suggest `view_file`. Test coverage in `xml-parser.test.ts` (46 tests, all green) covers both parsers, the partial-opener detector for both flavors, the unified extraction helper, and the new error formatter.
## v1.13.15-codecontext-synth — 2026-05-22
Forced second-inference synthesis pass for codecontext overview-class tools (`get_codebase_overview`, `get_framework_analysis`, `get_semantic_neighborhoods`). After the tool result lands, the pipeline expands the truncated head via in-process `readTruncation`, extracts referenced file paths from the full content, auto-fetches top-N files + project docs (BOOCHAT.md, AGENTS.md, *roadmap*.md, CONTEXT.md) under a 32k-token budget with explicit drop-priority order, then streams a synthesis turn that replaces the recursive `runAssistantTurn`. The 32k truncated head still ships to the synth model (token-budget contract preserved); the expansion is reference-extraction-only. Falls through to recursion on timeout (90s), model error, or non-2xx; user-abort marks the synth message `status='failed'` and re-throws (the outer abort handler operates on the parent turn's message, not the new synth row — without explicit marking, the row would sit `streaming` until the 5-min sweeper, tripping the 60s stale-stream banner). Adds `'synthesis'` to `message_parts.kind` CHECK constraint via `DROP CONSTRAINT IF EXISTS` + `DO $$ pg_constraint` idempotency-guarded re-add. Smokes #1, #2, #6 all clean; smokes #3#5 are content-quality checks for UI review.
## v1.13.14-skills-audit — 2026-05-22
Multi-topic batch. **Skills audit (headline):** vendored all 26 skills from `/home/samkintop/opt/skills/` into repo-local `data/skills/` (the `/opt/skills:/data/skills` override mount removed from `docker-compose.yml` so skills are auditable per-batch in git). Audited via 5 parallel Claude Code agent-teams running mgechev's 4-step protocol per skill — 14 survive with gerund-form names + refined triggers; 11 dropped (duplicates, BooCode-irrelevant patterns, Claude-already-does-natively); 1 (`verification-before-completion`) migrated to `BOOCHAT.md`/`BOOCODER.md` as an always-true rule. The Codeminer42 "rules vs recipes" split codified in those files. **Token tracking + stale-stream banner fix:** same root cause — `IsoTimestamp = z.string()` in `ws-frames.ts` was failing on postgres `Date` objects, silently dropping every `message_complete` / `session_updated` / `chat_updated` frame through the `v1.13.13-ws-publish` Zod gate; `z.preprocess(v => v instanceof Date ? v.toISOString() : v, ...)` applied to the primitive on both server + web (parity test still passes). **Codecontext ignore:** `codecontext_client.ts` auto-installs `.codecontextignore.template` into any project's root on first call (stops the upstream empty-source-file parser crash on foreign projects' `node_modules`). **Budget bump:** `BUDGET_READ_ONLY` + `BUDGET_NO_AGENT` 30 → 50 (real recon need ~27 + headroom for codecontext failure-retry turns; doom-loop guard catches the loop class anyway). **UI:** queued-message dropdown → edit / force-send / cancel buttons in `ChatPane.tsx`; `ChatThroughput` removed from desktop tab strip (mobile tab switcher keeps it). Audit decisions in `openspec/changes/v1.13.12-skills-audit/audit-notes.md`.
## v1.13.13-ws-publish — 2026-05-22
Second half of the WebSocket-frame-typing batch. Converts the existing ~50 inference + auto_name publish sites (via the `index.ts` adapter) plus ~30 direct `broker.publish*` call sites in routes + compaction, so every server-emitted frame now goes through Zod validation at the broker boundary. Pairs with `v1.13.12-ws-schemas`.
## v1.13.12-ws-schemas — 2026-05-22
First half of the WebSocket-frame-typing batch. Adds `apps/server/src/types/ws-frames.ts` with Zod schemas for all 27 wire-format frame types (discriminated union `WsFrameSchema` + `KNOWN_FRAME_TYPES` diagnostic lookup), duplicated byte-identical at `apps/web/src/api/ws-frames.ts` with a parity test. Introduces the `publishFrame` / `publishUserFrame` wrappers that fail-closed on schema mismatch.
## v1.13.11-tools — 2026-05-22
Tiered tool loading via `BOOCODE_TOOLS` env var (`core` | `standard` | `all`). Core = 4 read-only fs tools (~2k token schema cost). Standard = +web + git + codecontext (~10k). All (default) = every tool in `ALL_TOOLS` (~21k). The var is a ceiling — narrows agent whitelists, never expands. Pattern lifted from `eyaltoledano/claude-task-master`.
## v1.13.10-openspec — 2026-05-22
Adopt `Fission-AI/OpenSpec`'s `openspec/changes/<slug>/{proposal,tasks,design}.md` shape for BooCode's own batch docs. Existing batch docs (`boocode_batch10.md`, `handoff_v1.13.8_prefix_verify.md`, `handoff_v1.13.10_per_tool_cost.md`) moved into `openspec/changes/archived/` via `git mv` to preserve history. Zero-dep documentation reformat.
## v1.13.9-agentlint — 2026-05-22
Manual audit of instruction files against `0xmariowu/AgentLint`'s 31-check standard. Removed identity-opener sections from `BOOCHAT.md` and `BOOCODER.md` (emphatic decoration the model doesn't need). Added `CLAUDE.local.md` to `.gitignore` — Claude Code's Glob ignores `.gitignore` by default, so local overrides were otherwise readable by any agent walking the workspace. `CLAUDE.md` passed all 10 checks unchanged.
## v1.13.8-tool-cost — 2026-05-22
Per-tool prompt/completion-token rolling averages surfaced in AgentPicker as at-a-glance cost hints. Implementation is the `tool_cost_stats` SQL view over `messages_with_parts` (`LATERAL jsonb_array_elements` on `tool_calls`), plus a read endpoint and a tooltip extension. Equal-split attribution — multi-tool turn divides tokens N-ways; the 100-call rolling mean absorbs split noise. Filters out `cap_hit` / `doom_loop` sentinels. Source data already lands via existing UPDATEs that `v1.13.5-stability-bundle`'s `includeUsage: true` fix made non-NULL.
## v1.13.7-compaction-trigger — 2026-05-22
Compaction overflow trigger lowered to `floor(0.85 × ctx_max)`, replacing the v1.11.0-era `ctx_max 20_000` formula. Old formula gave only 7.6% headroom at 262k context and 0 budget for ≤20k contexts (never fired). New formula gives consistent 15% summarizer headroom across all model sizes. Opencode pattern lift from `session/overflow.ts`.
## v1.13.6-prefix-stability — 2026-05-22
System-prompt prefix stability verify-and-measure. Recon during planning disproved the original DB-cache premise: `buildSystemPrompt` already runs over inputs mtime-cached at the file layer (BOOCHAT.md, AGENTS.md global+per-project), and DB scalars are byte-stable until edited. This batch closes the verification gap with instrumentation, not implementation — `buildSystemPromptWithFingerprint` computes SHA-256 over the assembled prefix and a per-session `Map` observer fires `prefix-drift` (warn) on hash change with field-level `changed_inputs` diff.
## v1.13.5-stability-bundle — 2026-05-22
Five fixes for latent regressions surfaced during the cosmetic-revert investigation. (1) `provider.ts``includeUsage: true` on `createOpenAICompatible` (default false omitted `stream_options.include_usage`; llama-swap never emitted usage; tokens_used / ctx_used were NULL on every assistant row since `v1.13.0-ai-sdk-v6`). (2) `MessageList.tsx``hasText = m.content.trim().length > 0` to skip whitespace-only tool-call-only turns rendering empty bubbles. (3) `BUDGET_NO_AGENT` raised 15 → 30 to match read-only agent cap. (4) `payload.ts` skips status='failed' + complete-but-empty assistant rows so cap-hit + Continue doesn't upstream-reject. (5) Misc UI sanitization.
## v1.13.4-reasoning-fix — 2026-05-22
Compaction head-assembly audit caught one fix: reasoning was omitted from the summarizer's view of tool-bearing turns, silently degrading summary quality for reasoning-channel models (qwen3.6). `v1.13.0-ai-sdk-v6` had wired reasoning end-to-end into inference but missed this one read site. `CompactionMessage` extended with `reasoning_parts`; `buildHeadPayload` embeds it as a `<reasoning>...</reasoning>` prose prefix on the assistant content (OpenAI wire shape has no structured reasoning field).
## v1.13.3-truncate — 2026-05-22
Port of opencode's `truncate.ts`. Full tool output retrievable via opaque `tr_<12 base32 chars>` id (~60 bits entropy) and a new `view_truncated_output(id)` tool. Tmpfs storage at `/tmp/boocode-truncations/` (overridable via `BOOCODE_TRUNCATION_DIR`), 5MB cap, 7-day TTL, orphan-reap on the periodic 60s sweeper. Wired through four tools: `view_file`, `list_dir`, `web_fetch`, `codecontext_client`. Each returns the existing sliced view plus an `outputPath` field when truncation fires.
## v1.13.2-compaction-prune — 2026-05-22
Two-tier compaction prune — opencode pattern that was half-shipped in v1.11.0. New `message_parts.hidden_at` column with partial index on `WHERE hidden_at IS NULL`. `messages_with_parts` view changed from `COALESCE(parts, legacy)` to a CASE that distinguishes "no parts at all → fall back to legacy column for pre-v1.13.0 history" from "all parts hidden → drop the row from the model payload" (smoke caught the `COALESCE` leaking hidden parts back via legacy fallback). `prune.ts` scans `tool_result` parts newest-first, protects the last 40k tokens, marks older candidates hidden once the combined estimate clears 20k.
## v1.13.1-cleanup-bundle — 2026-05-22
Four independent items owed from prior dispatches. (1) `statement_timeout = '30s'` at the database level (documented in `schema.sql` but applied operationally — `ALTER DATABASE` can't run inside a `DO` block). (2) Tool registry alpha-sorted at module load — llama.cpp's prompt cache hits on byte-identical prefixes; reordering tools near the top of the system prompt would invalidate every cached turn. (3) Periodic 60s stuck-row sweeper. (4) `experimental_repairToolCall` to keep streams alive on malformed qwen3.6 tool args (pass-through implementation — logs and forwards unmodified; existing zod-reject path routes back to the model).
## v1.13.0-ai-sdk-v6 — 2026-05-22
Major migration to AI SDK v6. Introduces the `streamCompletion` adapter (`services/inference/stream-phase.ts`) over `streamText`, with five known gotchas the LSP can't catch — abort signals swallowed by `fullStream` (post-iteration throw required), usage lands only at stream end via `await result.usage`, tools have no `execute` field (BooCode dispatches in `tool-phase.ts`), and tool-call-only turns may emit a leading `\n` text-delta. Also ships the `messages_with_parts` view (parts-merge read path) and wires `reasoning_parts` end-to-end via a `ReasoningPart` in the v6 ModelMessage. Ports `ask_user_input` correlation queries from JSON columns to `message_parts` JOINs.
## v1.12.4-inference-split — 2026-05-21
Complete `inference.ts` split into `services/inference/`. Pieces: `turn.ts` (orchestration — `runAssistantTurn` / `runInference` / `createInferenceRunner`), `sentinel-summaries.ts` (`runCapHitSummary`, `runDoomLoopSummary`), `stream-phase.ts`, `tool-phase.ts`, `provider.ts`, `payload.ts`, `prune.ts`, `budget.ts`, `xml-parser.ts`, `error-handler.ts`, `sentinels.ts`, `parts.ts`, `types.ts`. Public surface re-exported via `inference/index.ts`; callers import from `./services/inference/index.js` explicitly (NodeNext doesn't honor directory-index resolution).
## v1.12.3-stale-banner — 2026-05-21
Stale-stream banner with Retry/Discard. When an assistant message sits `status='streaming'` with no token activity for 60+ seconds, the chat shows a banner above the input. Both actions clear the stale row via new `POST /api/chats/:id/discard_stale` (updates `status='failed'`, publishes `chat_status='idle'`). Closes the UX gap from the 2026-05-21 debugging spiral — slow streams and dead streams now look different.
## v1.12.2-live-toks — 2026-05-21
Live tok/s + ctx display next to the status indicator. `ChatThroughput` renders inline beside `StatusDot` while streaming or tool_running. Subscribes to existing `'usage'` WS frames (500ms-throttled, carrying `completion_tokens` + `ctx_used` + `ctx_max`) via `sessionEvents`. Hides when status drops to idle/error or data is older than 10s. Addresses the same UX gap as `v1.12.3-stale-banner` — gives users a live token velocity readout that immediately distinguishes slow from dead.
## v1.12.1-stop-handler — 2026-05-21
`handleAbortOrError` now writes `status='cancelled'` on user stop; rows no longer stuck `streaming` forever. Drops stale `messages_status_check` constraint (only `messages_status_chk` remains, allowing 'cancelled' via TS `MESSAGE_STATUSES`). Removes `detectSameNameLoop` and `DOOM_LOOP_SAME_NAME_THRESHOLD` (added during the 2026-05-21 debugging spike, never fired in any real run) plus 12 verbose `ctx.log.info` diagnostic markers from the same spike. Bundles workspace pane sync + status indicator overhaul + startup hung-row sweep that landed earlier in v1.12.1 work.
## v1.12.0-codecontext — 2026-05-21
Adds the `codecontext` sidecar (Go-based code-graph indexer at `codecontext:8080/v1/<tool_name>` over `boocode_net`) plus container guidance and skills runtime updates. Introduces the `chat_status` WS frame (`streaming | tool_running | waiting_for_input | idle | error`, widened from `working|idle|error`). Drops the deprecated `session_panes` table — workspace pane state moves to `sessions.workspace_panes jsonb` for cross-device sync via `PATCH /api/sessions/:id/workspace`.
## v1.11.1-consolidation — 2026-05-21
Rollup of v1.11.0v1.11.10 work that was shipped piecemeal. Covers anchored rolling compaction (single `summary=true` row per chat that supersedes itself), doom-loop guard via `detectDoomLoop`, `path_guard` secret-filename deny list, web tools (`web_search` against SearXNG + `web_fetch` with SSRF/private-IP block), and the 5MB stream-cap on response bodies with abort-on-overflow.
## v1.11.0-context-bar — 2026-05-20
Persistent context-window tracker in `ChatPane` + `ctx_max` capture via `${LLAMA_SWAP_URL}/upstream/<model>/props`. First inferences after a boocode boot may have `ctx_max=NULL` if llama-swap hasn't loaded the model yet — 60s negative cache TTL recovers on next turn. Replaced an earlier dead read of `parsed.timings.n_ctx` which never carried n_ctx.
## v1.10.1-booterm-user — 2026-05-19
Per-user shell privilege drop in the booterm container via `gosu` in `tmux.conf` default-command. Shells launched in browser terminal panes drop privs to `samkintop` rather than running as root inside the container.
## v1.10.0-booterm — 2026-05-18
Second container (`apps/booterm`, port 9501, bookworm-slim+glibc). Fastify + node-pty + tmux. Browser terminal panes connect via WS to `/ws/term/sessions/:sid/panes/:pid`; per-session tmux session `bc-<sid>`, per-pane window `term-<pid>`. xterm-addon-webgl with `document.fonts.load(...)`-gated init (Canvas2D doesn't honor `font-display: block`) and iOS-friendly visibility-change context recreation.
## v1.9.2-ask-user-input — 2026-05-18
`ask_user_input` elicitation tool. Pauses the inference loop and surfaces a prompt to the user; their response routes back as the tool result. Correlation initially via `messages.tool_calls` / `tool_results` JSON columns (later ported to `message_parts` in `v1.13.0-ai-sdk-v6`).
## v1.9.1-skills — 2026-05-18
Skills runtime + `/skill` slash command with autocomplete. Server-side parser, tools, `/api/skills`, and mount. Hardens `.dockerignore` to exclude `secrets/` and `data/`. Drops the type-to-confirm gate on chat delete (plain Cancel/Confirm only — per workspace convention).
## v1.9.0-themes-settings — 2026-05-17
Settings pane + per-project defaults + bulk archive + themes lift. `themes-v1` (18 preset palettes) ships in the same batch with a Settings picker for live theme switching.
## v1.8.2-cap-hit — 2026-05-17
Tool-loop cap-hit summary — when an assistant exceeds the per-turn tool budget, a sentinel `role='system'` row with `metadata.kind='cap_hit'` is inserted and a summary turn runs to give the user a coherent endpoint. Also compacts the tool-call UI rendering.
## v1.8.1-agents-global — 2026-05-16
Global agents (`data/AGENTS.md` bind-mounted at `/data/AGENTS.md`) + parser robustness + WS reconnect toast. Per-project `AGENTS.md` mechanism (`getAgentsForProject`) remains for *other* projects; the BooCode repo itself uses global-only to eliminate two-files-must-stay-in-sync drift.
## v1.8.0-agents — 2026-05-16
Tier 2 agents — `AGENTS.md` registry + per-session agent picker. Also lands mobile tab switcher, branch indicator, and the `git_status` tool.
## v1.7.0-drag-drop — 2026-05-16
Drag-drop + paste-as-attachment for long text in the chat input.
## v1.6.0-mobile — 2026-05-16
Full mobile suite. Adds `useViewport` (matchMedia breakpoints mobile <768 / tablet 7681023 / desktop ≥1024), `useSidebarDrawer` / `useRightRailDrawer` (Context + auto-close on `useLocation().pathname` change), `useLongPress` (500ms timer, synthetic `contextmenu`), `usePullToRefresh` (80px threshold, 600ms hold), `SwipeablePaneTab` (60px close, 30px vertical bail). Mobile headers with safe-area padding, hamburger left, FolderTree right. Tap targets at `max-md:min-h-[44px] max-md:min-w-[44px]`. Raises `MAX_TOOL_LOOP_DEPTH` 5 → 15. Right-rail becomes a drawer on mobile.
## v1.5.1-bootstrap — 2026-05-16
Bootstrap fixes — git + ssh installed in the boocode container, Tailscale host rewrite, `/opt/projects` label correction for the create-new-project bootstrap flow.
## v1.5.0-refactor-tests — 2026-05-16
Refactor split (FileBrowserPane / Workspace / `runAssistantTurn`) + vitest harness + unit tests for security-critical pure functions. Scopes the `/opt` mount to `/opt/projects` (writable) plus `PROJECT_ROOT_WHITELIST=/opt` (read-only resolution for add-existing). Surfaces swallowed errors and removes dead `session_renamed` paths.
## v1.4.0-fork-header — 2026-05-16
Fork from message + delete message + header polish + general housekeeping.
## v1.3.0-chats-projects — 2026-05-16
Chats-in-sessions era. Adds force-send, `/compact`, right-rail file browser, archive/rename/Open-in-Gitea sidebar context menu, archived projects landing page, create-project bootstrap with Gitea remote setup, landing-card buttons, 1000px content cap. Dedup audit and chat archive/delete from the sidebar.
## v1.2.0-multi-pane — 2026-05-15
Multi-pane workspace (batch 3, T1T8). `session_panes` schema (later replaced by `sessions.workspace_panes jsonb` in v1.12.0), `Pane` discriminated union, broker user channel + `/api/ws/user`, `file_ops` + `file_index` services, `PaneShell` / `ChatPane` / `FileBrowserPane` / `PaneTab` / `Workspace` components, `usePanes` hook, Shiki integration in `CodeBlock`. Up to 5 panes per session; default chat pane created on `POST /api/sessions`.
## v1.1.0-markdown-sidebar — 2026-05-15
Markdown rendering, message actions, tok/s + ctx display, AI session naming. Sidebar restructure — chats nested under projects (max 5 + view-all), live updates via WS.
## v1.0.0-initial — 2026-05-14
Initial commit. Skeleton of the monorepo: `apps/server` (Fastify + postgres), `apps/web` (React + Vite), basic chat loop against llama-swap.

View File

@@ -46,12 +46,24 @@ Tests: `pnpm -C apps/server test` runs the vitest suite. No test harness on `app
- **Zod** for request validation and config parsing.
Key services:
- **`services/inference/`** (v1.12.4 split — was a single `inference.ts` file). Public surface re-exported via `inference/index.ts`; callers import from `./services/inference/index.js`. Layout: `turn.ts` (runAssistantTurn / runInference / createInferenceRunner orchestration, plus `InferenceFrame`, `InferenceContext`, `TurnArgs`, `StreamResult` exported), `stream-phase.ts` (streamCompletion + executeStreamPhase + SSE parsing), `tool-phase.ts` (executeToolPhase; back-edges into turn.ts for the runAssistantTurn recursion — cycle is safe because dereferenced at call time, not module top-level), `sentinel-summaries.ts` (runCapHitSummary + runDoomLoopSummary + their sentinel inserters; two near-clones kept side-by-side until a third sentinel justifies factoring out runWrapUpSummary), `error-handler.ts` (handleAbortOrError, finalizeCompletion), `payload.ts` (buildMessagesPayload, loadContext, maybeFlagForCompaction, `OpenAiMessage`), `sentinels.ts` (`detectDoomLoop`, `DOOM_LOOP_THRESHOLD`, sentinel predicates), `budget.ts` (resolveToolBudget), `xml-parser.ts` (Qwen-coder XML tool-call fallback), `types.ts` (`StreamPhaseState`, `DB_FLUSH_INTERVAL_MS` shared between stream-phase and sentinel-summaries). **`TurnArgs`** is the per-turn state envelope threaded through the `executeToolPhase → runAssistantTurn` recursion (`toolsUsed`, `recentToolCalls`, `assistantMessageId`, `signal`); reset to defaults in `runInference` at the user-message boundary. Cap-hit (`toolsUsed >= budget`) and doom-loop (`detectDoomLoop(recentToolCalls)`) checks both read from this envelope. Add new per-turn state to `TurnArgs` in `turn.ts`, not module-level closures.
- **`services/inference/`** Public surface re-exported via `inference/index.ts`; callers import from `./services/inference/index.js` explicitly (NodeNext doesn't honor directory-index resolution). Layout: `turn.ts` (runAssistantTurn / runInference / createInferenceRunner; exports `InferenceFrame`, `InferenceContext`, `TurnArgs`, `StreamResult`), `stream-phase.ts` (streamCompletion as a v1.13.1-A AI SDK adapter + executeStreamPhase), `provider.ts` (`upstreamModel(baseURL, modelId)` wrapping `createOpenAICompatible` against llama-swap), `tool-phase.ts` (executeToolPhase; value back-edges into turn.ts for the runAssistantTurn recursion — cycle safe because deref at call time, not module top-level), `sentinel-summaries.ts` (runCapHitSummary + runDoomLoopSummary + their sentinel inserters), `error-handler.ts` (handleAbortOrError, finalizeCompletion), `payload.ts` (buildMessagesPayload, loadContext, maybeFlagForCompaction, `OpenAiMessage`), `sentinels.ts` (`detectDoomLoop`, `DOOM_LOOP_THRESHOLD`, sentinel predicates), `budget.ts` (resolveToolBudget), `xml-parser.ts` (qwen3.6 XML tool-call fallback — KEEP, AI SDK doesn't handle inline-XML tool calls), `parts.ts` (v1.13.0 dual-write helpers: `partsFromAssistantMessage`, `partsFromToolMessage`, `insertParts`), `prune.ts` (v1.13.4 two-tier compaction; `selectPruneTargets` is the pure decision helper), `types.ts` (`StreamPhaseState`, `DB_FLUSH_INTERVAL_MS`). **`TurnArgs`** is the per-turn state envelope threaded through the `executeToolPhase → runAssistantTurn` recursion; reset in `runInference` at user-message boundary. Add new per-turn state to `TurnArgs`, not module-level closures.
- **AI SDK v6 streamCompletion adapter** (v1.13.1-A; `services/inference/stream-phase.ts`). `streamText` is the underlying call; the BooCode layer above (executeStreamPhase, finalize, dual-write) is shape-preserved via an adapter. Five gotchas the LSP/test suite won't catch:
- **Abort signals are swallowed.** `streamText`'s `fullStream` iterator exits cleanly when `abortSignal` fires — no throw. Post-iteration `if (signal?.aborted) throw <AbortError>` is required; without it the row finalizes as `complete` instead of `cancelled`. Comment in stream-phase.ts pins this; don't refactor it away.
- **Usage lands only at stream end** via `await result.usage` (`inputTokens` / `outputTokens` v6 names → mapped to `promptTokens` / `completionTokens` for the existing onUsage callback). Mid-stream live tok/s is gone vs v1.12.2; ChatThroughput shows a single value at stream end.
- **Tools have NO `execute` field.** BooCode dispatches tools in tool-phase.ts, not the AI SDK loop. Only `description` + `inputSchema: jsonSchema(parameters)` — surfacing tool-call parts via `fullStream` and stopping is what we want.
- **`includeUsage: true` MUST be set on `createOpenAICompatible`** in `services/inference/provider.ts`. The adapter defaults it false, omitting `stream_options.include_usage` from the request body; llama-swap then never emits the usage block and `result.usage.inputTokens/outputTokens` resolve to `undefined`. Latent regression from v1.13.1-A through v1.13.7 — every assistant row in that window has `tokens_used`/`ctx_used` NULL. Don't remove this flag during refactor.
- **Tool-call-only turns may emit a leading `\n` text-delta** as the assistant content. `MessageList.flatten`'s `hasText` and `MessageBubble`'s `hasContent` both `.trim()` before the length check — otherwise whitespace-only content renders an empty bubble + ActionRow between every tool call (v1.13.7 fix). `payload.ts:buildMessagesPayload` also skips `status='failed'` AND complete-but-empty (no content, no tool_calls) assistant rows to avoid "Cannot have 2 or more assistant messages at the end of the list" upstream rejections after cap-hit + Continue.
- **AI SDK ModelMessage conversion** (`toModelMessages` in stream-phase.ts). Tool messages need a `toolName` for `ToolResultPart` — BooCode's OpenAI-shape history doesn't carry it, so a forward-scan builds a `tool_call_id → toolName` map from prior assistant `tool_calls`. Tool outputs wrapped as `{ type: 'json' | 'text', value }` matching the v6 `ToolResultOutput` union. Assistant messages with reasoning emit a `ReasoningPart` first in the content array (v1.13.1-C).
- **`experimental_repairToolCall`** (v1.13.3) wired into `streamText` to keep the stream alive when qwen3.6 emits malformed tool args. Pass-through implementation — logs the bad call and returns it unmodified; `executeToolPhase`'s existing zod-reject error path routes it to the model on the next turn.
- **`chat_status` frame shape** (published via `broker.publishUser`) — `status: 'streaming' | 'tool_running' | 'waiting_for_input' | 'idle' | 'error'` (widened from `working|idle|error` in v1.12.1). Frontend `useChatStatus` derives `idle_warm` (<30s since idle) vs `idle_cold`. `ChatThroughput` renders inline beside `StatusDot` only when streaming or tool_running, fed by 500ms-throttled `'usage'` WS frames (`completion_tokens` + `ctx_used` + `ctx_max`). The `POST /api/chats/:id/discard_stale` endpoint exists to mark a stuck-streaming row as `failed` when the frontend's 60s no-token-activity timer (`ChatPane` content-length watcher) gives up.
- **Boot-time stale-streaming sweep** in `apps/server/src/index.ts` after `applySchema()`: any `messages.status='streaming'` older than 5 minutes flips to `'failed'`. Logs only on non-zero count. Recovers from container restart while inference was mid-stream (v1.12.1).
- **`services/broker.ts`** — In-memory pub/sub with two channel types: per-session (message streaming) and per-user (sidebar updates). No persistence; clients reconnect on restart.
- **`services/tools.ts`** — Tool registry (`ALL_TOOLS`, `READ_ONLY_TOOL_NAMES`, `TOOLS_BY_NAME`). Filesystem tools (view_file/list_dir/grep/find_files) go through three guard layers: `path_guard.ts` (workspace scope), `secret_guard.ts` (filename deny list), `url_guard.ts` (SSRF/private-IP block for web_fetch). v1.11.8+ web tools (`web_search`, `web_fetch`) are opt-in per chat via `session.web_search_enabled` (resolved with `project.default_web_search_enabled` fallback) and filtered out of the LLM's tool schema when false.
- **`services/compaction.ts`** + **`services/model-context.ts`** — v1.11.0 anchored rolling summary (single `summary=true` assistant row per chat, supersedes itself on each compaction). Triggered when `chats.needs_compaction` is set after an inference turn exceeds `usable(ctx_max) = ctx_max - 20k`. **`ctx_max` comes from `model-context.getModelContext()` which fetches `${LLAMA_SWAP_URL}/upstream/<model>/props`** — NOT from `parsed.timings.n_ctx` (the stream completion's `timings` doesn't carry n_ctx; that read was dead code until v1.11.3 ripped it out).
- **Periodic 60s sweeper** in `apps/server/src/index.ts` (v1.13.3 + v1.13.5). Same `setInterval` runs `sweepStaleStreaming` (marks `messages.status='streaming'` older than 5 min as `failed`, publishes `chat_status='idle'` so the UI dot drops) and `cleanupTruncations` (TTL + orphan reap of tmpfs truncation files). `app.addHook('onClose')` clears the timer. No-op when nothing to reap.
- **`services/broker.ts`** — In-memory pub/sub with two channel types: per-session (message streaming) and per-user (sidebar updates). No persistence; clients reconnect on restart. v1.13.11: every WS publish goes through `broker.publishFrame(sessionId, frame)` or `broker.publishUserFrame(user, frame)` — both Zod-validate against `WsFrameSchema` (`types/ws-frames.ts`) and fail-closed (log + drop). `ctx.publish` / `ctx.publishUser` in inference + auto_name route through the index.ts adapter that calls publishFrame internally. The schema is duplicated byte-identical at `apps/web/src/api/ws-frames.ts`; a `ws-frames.test.ts` case enforces parity. Don't add new raw `broker.publish()` / `publishUser()` calls.
- **`services/tools.ts`** — Tool registry (`ALL_TOOLS`, `READ_ONLY_TOOL_NAMES`, `TOOLS_BY_NAME`). Filesystem tools (view_file/list_dir/grep/find_files) go through three guard layers: `path_guard.ts` (workspace scope), `secret_guard.ts` (filename deny list), `url_guard.ts` (SSRF/private-IP block for web_fetch). v1.11.8+ web tools (`web_search`, `web_fetch`) are opt-in per chat via `session.web_search_enabled` (resolved with `project.default_web_search_enabled` fallback) and filtered out of the LLM's tool schema when false. v1.13.5 truncation: when a tool slice cuts content, `services/truncate.ts` stashes the full text on tmpfs at `BOOCODE_TRUNCATION_DIR` (default `/tmp/boocode-truncations`, 0o700) keyed by an opaque `tr_<12 base32 chars>` id, and the `view_truncated_output(id)` tool retrieves it. 5MB cap (matches `view_file`'s `MAX_FILE_BYTES`), 7-day TTL, reaped by the periodic sweeper. Tmpfs path means container restart loses retrieval — acceptable, the model usually has moved on.
- **`services/compaction.ts`** + **`services/model-context.ts`** — v1.11.0 anchored rolling summary (single `summary=true` assistant row per chat, supersedes itself on each compaction). Triggered when `chats.needs_compaction` is set after an inference turn exceeds `usable(ctx_max) = floor(0.85 × ctx_max)` (v1.13.9 opencode-pattern early trigger; was `ctx_max - 20k` pre-v1.13.9, which gave only 7.6% headroom at 262k and 0 budget for ≤20k contexts). **`ctx_max` comes from `model-context.getModelContext()` which fetches `${LLAMA_SWAP_URL}/upstream/<model>/props`** — NOT from `parsed.timings.n_ctx` (the stream completion's `timings` doesn't carry n_ctx; that read was dead code until v1.11.3 ripped it out). First inferences after a boocode boot may have `ctx_max=NULL` if llama-swap hasn't loaded the model yet; negative cache TTL is 60s, recovers on next turn. v1.13.6: `buildHeadPayload` embeds `reasoning_parts` as a `<reasoning>...</reasoning>` prose prefix on the assistant `content` (OpenAI wire shape has no structured reasoning field; the summarizer reads text). Standalone tag when content is empty (tool-call-only turn). `buildHeadPayload` + `OpenAiMessage` exported for test access — keep them exported.
- **`services/system-prompt.ts`** — `buildSystemPrompt` is the string-returning shim; `buildSystemPromptWithFingerprint` is the canonical impl returning `{prompt, fingerprint, drift}`. v1.13.8 instrumentation: SHA-256 of the assembled prefix is logged per `buildMessagesPayload` call (msg `prefix-fingerprint`, level=info); a `Map<sessionId, lastHash>` observer fires `prefix-drift` (level=warn) on hash change with a field-level `changed_inputs` diff. Smoke proved the prefix is byte-stable across turns in steady-state — the originally-planned `system_prompt_cache` DB table was dropped as redundant against the v1.12.0 input-layer mtime caches (BOOCHAT.md here + AGENTS.md global+per-project in `agents.ts:safeStat`).
- **`services/inference/budget.ts`** — tool-call budgets: `BUDGET_READ_ONLY = 30`, `BUDGET_NON_READ_ONLY = 10` (forward-looking; no write tools yet), `BUDGET_NO_AGENT = 30` (v1.13.7; was 15 — every tool in `ALL_TOOLS` is read-only today, so no-agent mode shares the read-only-agent cap). Per-agent `max_tool_calls` from AGENTS.md frontmatter overrides.
- **`messages_with_parts` view** (v1.13.1-B; `schema.sql`). Read sites that need `tool_calls` / `tool_results` / `reasoning_parts` SELECT from this view, NOT `messages` directly. `COALESCE`s parts-table rows over the legacy JSON columns, so pre-v1.13.0 history still resolves. Writes still target `messages`; the v1.13.0 dual-write into `message_parts` keeps both halves in sync. New payload-assembly code must use the view — calling `messages.tool_calls` directly will miss anything written post-v1.13.1-B if the JSON column ever drifts (and dual-write makes that easy to miss). Shapes: `tool_calls jsonb[]`, `tool_results jsonb` single object, `reasoning_parts jsonb[]` of `{text}`.
- **`services/file_ops.ts`** — Shared file operation implementations used by both inference tools and HTTP routes.
- **`services/auto_name.ts`** — Non-streaming LLM call to generate 4-word session titles after first assistant reply.
@@ -93,21 +105,24 @@ Sessions hold 15 panes (chat / empty / placeholder terminal+agent). v1.12.1 m
## Database
PostgreSQL 16. Tables: `projects`, `sessions`, `chats`, `messages`, `settings`. (`session_panes` was dropped in v1.12.1; workspace pane state lives in `sessions.workspace_panes jsonb`.) Schema applied idempotently on startup via `applySchema()`. Use `clock_timestamp()` (not `NOW()`) inside transactions. CHECK constraints in place: `projects_status_chk` ('open'|'archived'), `sessions_status_chk` (same), `chats_status_chk` (same), `messages_role_chk`, `messages_status_chk` — keep in sync with the `*_STATUSES` const arrays in `apps/server/src/types/api.ts`. The older anonymous `messages_status_check` (without 'cancelled') and `messages_role_check` (without 'system') were dropped in v1.12.1; only the `_chk` variants remain.
PostgreSQL 16. Tables: `projects`, `sessions`, `chats`, `messages`, `settings`, `message_parts` (v1.13.0). Views: `messages_with_parts` (v1.13.1-B parts-merge read path), `tool_cost_stats` (v1.13.10 per-tool 100-call rolling window). (`session_panes` was dropped in v1.12.1; workspace pane state lives in `sessions.workspace_panes jsonb`.) Schema applied idempotently on startup via `applySchema()`. Use `clock_timestamp()` (not `NOW()`) inside transactions. CHECK constraints in place: `projects_status_chk` ('open'|'archived'), `sessions_status_chk` (same), `chats_status_chk` (same), `messages_role_chk`, `messages_status_chk` — keep in sync with the `*_STATUSES` const arrays in `apps/server/src/types/api.ts`. The older anonymous `messages_status_check` (without 'cancelled') and `messages_role_check` (without 'system') were dropped in v1.12.1; only the `_chk` variants remain.
Schema CHECK migration order when renaming allowed values: (1) `ALTER TABLE ... DROP CONSTRAINT IF EXISTS <system_name>` (inline `CREATE TABLE` checks get `<table>_<column>_check`), (2) `UPDATE` rows to new values, (3) wrap new constraint ADD in `DO $$ ... pg_constraint` guard — that block is the only way to get `ADD CONSTRAINT IF NOT EXISTS`.
## Environment
Required: `DATABASE_URL`, `LLAMA_SWAP_URL`. Optional: `PORT` (3000), `HOST` (0.0.0.0), `PROJECT_ROOT_WHITELIST` (/opt, read-only scope for add-existing path resolution), `BOOTSTRAP_ROOT` (/opt/projects, writable scope for create-new-project bootstrap mkdir target — host must `mkdir -p /opt/projects` before container start), `DEFAULT_MODEL`, `LOG_LEVEL`, `SEARXNG_URL` (default `http://100.114.205.53:8888` — internal Tailscale Fathom; the public `search.indifferentketchup.com` is behind Authelia and unusable from server context).
Required: `DATABASE_URL`, `LLAMA_SWAP_URL`. Optional: `PORT` (3000), `HOST` (0.0.0.0), `PROJECT_ROOT_WHITELIST` (/opt, read-only scope for add-existing path resolution), `BOOTSTRAP_ROOT` (/opt/projects, writable scope for create-new-project bootstrap mkdir target — host must `mkdir -p /opt/projects` before container start), `DEFAULT_MODEL`, `LOG_LEVEL`, `SEARXNG_URL` (default `http://100.114.205.53:8888` — internal Tailscale Fathom; the public `search.indifferentketchup.com` is behind Authelia and unusable from server context), `BOOCODE_TOOLS` (`core` | `standard` | `all`, default `all`; v1.13.15-tools tier filter — ceiling, never expands an agent's whitelist).
## Workflow
- Sam reviews all diffs and commits manually. Do not commit unless explicitly asked.
- Per-batch docs live under `openspec/changes/<slug>/{proposal,tasks,design}.md`. Already-shipped batches are snapshots in `openspec/changes/archived/`. New batches follow the proposal+tasks shape; see `openspec/README.md` for the convention.
- Deploy: `cd /opt/boocode && docker compose up --build -d` (or `docker compose build --no-cache boocode && docker compose up -d` if you suspect a layer-cache issue).
- Git push to Gitea: `GIT_SSH_COMMAND="ssh -i /opt/boocode/secrets/boocode_gitea -o IdentitiesOnly=yes" git push origin <branch>`. The default agent identity is rejected; the in-repo deploy key (`secrets/`, gitignored) is the working one. Transient `Connection reset by peer` retries cleanly after `sleep 5`.
- Don't accumulate `.bak-*` files. Clean them up in the same batch or immediately after merge.
- DB-integration tests opt-in via env var: `DATABASE_URL='postgres://boocode:devpass@localhost:5500/boocode' pnpm -C apps/server test`. Host port is 5500 (mapped from `boocode_db:5432`); password is `${POSTGRES_PASSWORD}` from `.env` (`devpass`), NOT the literal in `.env`'s `DATABASE_URL=postgres://boocode:Ketchup1479@boocode_db:5432/...` line. Pattern: `describe.runIf(!!process.env.DATABASE_URL)(...)` with a `beforeAll` that applies the schema via `sql.unsafe(readFileSync(schemaPath))`. Tests skip cleanly when var is unset. `tool_cost_stats.test.ts` is the reference.
- Host-side smoke endpoint: `curl http://100.114.205.53:9500/api/...`. The boocode container's port mapping binds to the Tailscale IP, not `0.0.0.0`, so `localhost:9500` doesn't work from the host shell. Same for booterm at `:9501`.
- Fastify global JSON parser tolerates empty bodies (overridden in `index.ts`); bodyless POSTs (archive, unarchive, stop) work without setting `Content-Type` tricks on the client.
- Event dedup discipline: for any mutation the server publishes via `broker.publishUser`, do NOT add a local `sessionEvents.emit(...)` after the API call — `useUserEvents` forwards the WS frame onto the bus. Frontend mutation handlers must be idempotent (dedup by id, no-op on already-present).
- `node:20-*` base images ship a `node` user at uid/gid 1000 — delete it (`userdel`/`groupdel` on debian, `deluser`/`delgroup` on alpine) before adding samkintop at 1000.

View File

@@ -10,17 +10,20 @@ import { registerProjectRoutes } from './routes/projects.js';
import { registerSessionRoutes } from './routes/sessions.js';
import { registerSettingsRoutes } from './routes/settings.js';
import { registerMessageRoutes } from './routes/messages.js';
import { registerArtifactRoutes } from './routes/artifacts.js';
import { registerChatRoutes } from './routes/chats.js';
import { registerSidebarRoutes } from './routes/sidebar.js';
import { registerWebSocket } from './routes/ws.js';
import { registerModelRoutes } from './routes/models.js';
import { registerAgentRoutes } from './routes/agents.js';
import { registerSkillsRoutes } from './routes/skills.js';
import { registerToolsRoutes } from './routes/tools.js';
import { createInferenceRunner } from './services/inference/index.js';
import { createBroker } from './services/broker.js';
import { listSkills } from './services/skills.js';
import * as compaction from './services/compaction.js';
import { configureModelContext } from './services/model-context.js';
import { cleanupTruncations } from './services/truncate.js';
async function main() {
const config = loadConfig();
@@ -73,7 +76,7 @@ async function main() {
return { status: dbOk ? 'ok' : 'degraded', db: dbOk };
});
const broker = createBroker();
const broker = createBroker(app.log);
registerProjectRoutes(app, sql, config, broker);
registerSessionRoutes(app, sql, config, broker);
@@ -82,6 +85,7 @@ async function main() {
registerAgentRoutes(app, sql);
registerSidebarRoutes(app, sql);
registerChatRoutes(app, sql, broker);
registerToolsRoutes(app, sql);
// Batch 9.6: warm the skills cache at boot and surface the count. Empty or
// missing /data/skills is non-fatal — the skill tools just return empty.
@@ -98,7 +102,9 @@ async function main() {
config,
log: app.log,
publish: (sessionId, frame) => {
broker.publish(sessionId, frame as unknown as Record<string, unknown> & { type: string });
// v1.13.11-b: route through the typed publishFrame so the broker's
// Zod gate validates every inference frame before delivery.
broker.publishFrame(sessionId, frame as unknown as import('./types/ws-frames.js').WsFrame);
},
// v1.11: broker handle for compaction.process to publish 'compacted'
// frames on the per-session channel. Inference's regular publish path
@@ -107,10 +113,10 @@ async function main() {
broker,
},
(user, frame) => {
broker.publishUser(user, frame as unknown as Record<string, unknown> & { type: string });
broker.publishUserFrame(user, frame as unknown as import('./types/ws-frames.js').WsFrame);
}
);
registerMessageRoutes(app, sql, {
registerMessageRoutes(app, sql, config, broker, {
enqueueInference: (sessionId, chatId, assistantId, user) => {
inference.enqueue(sessionId, chatId, assistantId, user);
},
@@ -126,60 +132,61 @@ async function main() {
},
hasActiveInference: (chatId) => inference.hasActive(chatId),
publishUserMessage: (sessionId, chatId, userMessageId, content) => {
broker.publish(sessionId, {
broker.publishFrame(sessionId, {
type: 'message_started',
message_id: userMessageId,
chat_id: chatId,
role: 'user',
});
broker.publish(sessionId, {
broker.publishFrame(sessionId, {
type: 'delta',
message_id: userMessageId,
chat_id: chatId,
content,
});
broker.publish(sessionId, {
broker.publishFrame(sessionId, {
type: 'message_complete',
message_id: userMessageId,
chat_id: chatId,
});
},
publishMessagesDeleted: (sessionId, chatId, messageIds) => {
broker.publish(sessionId, {
broker.publishFrame(sessionId, {
type: 'messages_deleted',
message_ids: messageIds,
chat_id: chatId,
});
},
publishSessionFrame: (sessionId, frame) => {
broker.publish(sessionId, frame);
broker.publishFrame(sessionId, frame as import('./types/ws-frames.js').WsFrame);
},
});
registerArtifactRoutes(app, sql);
registerSkillsRoutes(app, sql, {
enqueueInference: (sessionId, chatId, assistantId, user) => {
inference.enqueue(sessionId, chatId, assistantId, user);
},
publishUserMessage: (sessionId, chatId, userMessageId, content) => {
broker.publish(sessionId, {
broker.publishFrame(sessionId, {
type: 'message_started',
message_id: userMessageId,
chat_id: chatId,
role: 'user',
});
broker.publish(sessionId, {
broker.publishFrame(sessionId, {
type: 'delta',
message_id: userMessageId,
chat_id: chatId,
content,
});
broker.publish(sessionId, {
broker.publishFrame(sessionId, {
type: 'message_complete',
message_id: userMessageId,
chat_id: chatId,
});
},
publishSessionFrame: (sessionId, frame) => {
broker.publish(sessionId, frame);
broker.publishFrame(sessionId, frame as import('./types/ws-frames.js').WsFrame);
},
});
registerWebSocket(app, sql, broker);
@@ -227,7 +234,7 @@ async function main() {
for (const row of rows) {
if (seenChats.has(row.chat_id)) continue;
seenChats.add(row.chat_id);
broker.publishUser('default', {
broker.publishUserFrame('default', {
type: 'chat_status',
chat_id: row.chat_id,
status: 'idle',
@@ -238,7 +245,13 @@ async function main() {
app.log.error({ err }, 'stuck-row sweeper failed');
}
};
const sweepTimer = setInterval(() => { void sweepStaleStreaming(); }, SWEEP_INTERVAL_MS);
// v1.13.5: truncation cleanup rides the same cadence — 60s tick reaps
// tmpfs files past the 7-day TTL plus any orphans whose owning part has
// been pruned (v1.13.4) or deleted. No-op when the dir is empty.
const sweepTimer = setInterval(() => {
void sweepStaleStreaming();
void cleanupTruncations({ sql, log: app.log });
}, SWEEP_INTERVAL_MS);
app.addHook('onClose', async () => { clearInterval(sweepTimer); });
const shutdown = async (signal: string) => {

View File

@@ -0,0 +1,70 @@
// v1.13.17-cross-repo-reads: PATCH /api/sessions/:id allowed_read_paths
// subset enforcement. Sam flagged in the compliance review that without a
// runtime subset check, a malicious client could POST
// {"allowed_read_paths":["/etc"]}
// and bypass the user-consent grant flow entirely. The findUnauthorizedAdditions
// helper is the guard; tests pin its behavior so a regression in the helper
// or its callsite (PATCH handler in sessions.ts) trips CI before prod.
import { describe, it, expect } from 'vitest';
import { findUnauthorizedAdditions } from '../sessions.js';
describe('findUnauthorizedAdditions — PATCH allowed_read_paths subset guard', () => {
it('returns no extras when requested is empty (full revoke)', () => {
expect(findUnauthorizedAdditions(['/opt/forks/foo'], [])).toEqual([]);
});
it('returns no extras when requested is a strict subset (single revoke)', () => {
expect(
findUnauthorizedAdditions(['/opt/forks/foo', '/opt/forks/bar'], ['/opt/forks/foo']),
).toEqual([]);
});
it('returns no extras when requested equals prior (no-op PATCH)', () => {
expect(
findUnauthorizedAdditions(['/opt/forks/foo', '/opt/forks/bar'], [
'/opt/forks/foo',
'/opt/forks/bar',
]),
).toEqual([]);
});
it('flags an unauthorized addition when prior is empty', () => {
// The /etc bypass attempt — Sam's specific concern from the compliance
// review. Without this guard, the PATCH would have written /etc directly.
expect(findUnauthorizedAdditions([], ['/etc'])).toEqual(['/etc']);
});
it('flags a single unauthorized addition mixed in with valid revokes', () => {
// The attacker still tries to be sneaky: keep one legit entry, drop
// another, slip in a new one. The guard catches the addition regardless
// of how the rest of the array shrinks.
expect(
findUnauthorizedAdditions(['/opt/forks/foo', '/opt/forks/bar'], [
'/opt/forks/foo',
'/var/secrets',
]),
).toEqual(['/var/secrets']);
});
it('flags every unauthorized addition when there are multiple', () => {
expect(
findUnauthorizedAdditions(['/opt/forks/foo'], ['/opt/forks/foo', '/etc', '/root']),
).toEqual(['/etc', '/root']);
});
it('treats requested duplicates correctly (each occurrence checked)', () => {
// If the requested array has duplicates of an unauthorized entry, the
// guard surfaces each one. (A frontend would never send duplicates, but
// the guard's contract shouldn't assume that.)
expect(findUnauthorizedAdditions([], ['/etc', '/etc'])).toEqual(['/etc', '/etc']);
});
it('does not flag entries present in prior even if requested has duplicates', () => {
// Duplicate of an authorized entry passes — the membership check is by
// value, not by index. Settled by Set.has semantics.
expect(
findUnauthorizedAdditions(['/opt/forks/foo'], ['/opt/forks/foo', '/opt/forks/foo']),
).toEqual([]);
});
});

View File

@@ -0,0 +1,231 @@
// v1.14.x-html-artifact-panes: artifact download routes.
//
// Two endpoints:
// POST /api/chats/:id/messages/:msg_id/artifacts/download?fmt=md|html
// Materialises a file under <projectRoot>/.boocode/artifacts/ and
// returns {path, url}. fmt=html requires an existing html_artifact part
// on the message (404 otherwise). fmt=md works on any assistant
// message with non-empty content.
//
// GET /api/projects/:project_id/artifacts/:filename
// Streams a previously-written artifact back with
// Content-Disposition: attachment. Path-guarded to the project's
// artifacts dir; rejects traversal attempts.
import { createReadStream } from 'node:fs';
import { realpath, stat } from 'node:fs/promises';
import { resolve, sep, basename } from 'node:path';
import type { FastifyInstance } from 'fastify';
import { z } from 'zod';
import type { Sql } from '../db.js';
import {
writeHtmlArtifact,
writeMarkdownArtifact,
type HtmlArtifactPayload,
} from '../services/artifacts.js';
const DownloadQuery = z.object({
fmt: z.enum(['md', 'html']),
});
// Filename safety: alnum, dash, dot, underscore only. Blocks `..`, slashes,
// nul bytes, etc. before we even touch the filesystem.
const FilenameRe = /^[A-Za-z0-9._-]+$/;
interface ChatRow {
id: string;
session_id: string;
project_id: string;
project_path: string;
}
interface MessageRow {
id: string;
chat_id: string;
role: string;
content: string;
}
export function registerArtifactRoutes(app: FastifyInstance, sql: Sql): void {
app.post<{
Params: { id: string; msg_id: string };
Querystring: { fmt?: string };
}>(
'/api/chats/:id/messages/:msg_id/artifacts/download',
async (req, reply) => {
const parsed = DownloadQuery.safeParse(req.query);
if (!parsed.success) {
reply.code(400);
return { error: 'invalid query', details: parsed.error.flatten() };
}
const { fmt } = parsed.data;
const { id: chatId, msg_id: messageId } = req.params;
const chatRows = await sql<ChatRow[]>`
SELECT c.id, c.session_id, s.project_id, p.path AS project_path
FROM chats c
JOIN sessions s ON s.id = c.session_id
JOIN projects p ON p.id = s.project_id
WHERE c.id = ${chatId}
`;
if (chatRows.length === 0) {
reply.code(404);
return { error: 'chat not found' };
}
const chat = chatRows[0]!;
const msgRows = await sql<MessageRow[]>`
SELECT id, chat_id, role, content
FROM messages
WHERE id = ${messageId} AND chat_id = ${chatId}
`;
if (msgRows.length === 0) {
reply.code(404);
return { error: 'message not found' };
}
const msg = msgRows[0]!;
if (msg.role !== 'assistant') {
reply.code(400);
return { error: 'only assistant messages produce artifacts' };
}
const ctx = { projectId: chat.project_id, projectRoot: chat.project_path };
try {
if (fmt === 'md') {
if (!msg.content || msg.content.trim().length === 0) {
reply.code(400);
return { error: 'message has no content to export' };
}
const result = await writeMarkdownArtifact(
{ content: msg.content },
ctx,
);
return result;
}
// fmt === 'html': require an html_artifact part on the message.
const partRows = await sql<{ payload: HtmlArtifactPayload }[]>`
SELECT payload
FROM message_parts
WHERE message_id = ${messageId} AND kind = 'html_artifact'
ORDER BY sequence ASC
LIMIT 1
`;
if (partRows.length === 0) {
reply.code(404);
return { error: 'no html_artifact part on this message' };
}
const result = await writeHtmlArtifact(partRows[0]!.payload, ctx);
return result;
} catch (err) {
req.log.error({ err, messageId, fmt }, 'artifact write failed');
reply.code(500);
return {
error: err instanceof Error ? err.message : 'artifact write failed',
};
}
},
);
// v1.14.x-html-artifact-panes: HtmlArtifactPane needs the payload on click
// to render its iframe. Returns 404 when the message has no html_artifact
// sibling part — frontend uses that signal to open the markdown_artifact
// pane variant instead. Payload shape matches HtmlArtifactPayload in
// services/artifacts.ts.
app.get<{ Params: { id: string; msg_id: string } }>(
'/api/chats/:id/messages/:msg_id/html_artifact',
async (req, reply) => {
const { id: chatId, msg_id: messageId } = req.params;
const partRows = await sql<{ payload: HtmlArtifactPayload }[]>`
SELECT payload
FROM message_parts mp
JOIN messages m ON m.id = mp.message_id
WHERE mp.message_id = ${messageId}
AND m.chat_id = ${chatId}
AND mp.kind = 'html_artifact'
ORDER BY mp.sequence ASC
LIMIT 1
`;
if (partRows.length === 0) {
reply.code(404);
return { error: 'no html_artifact part on this message' };
}
return partRows[0]!.payload;
},
);
app.get<{ Params: { project_id: string; filename: string } }>(
'/api/projects/:project_id/artifacts/:filename',
async (req, reply) => {
const { project_id: projectId, filename } = req.params;
// Strip directory components defensively; only the basename is allowed.
const base = basename(filename);
if (base !== filename || !FilenameRe.test(base)) {
reply.code(400);
return { error: 'invalid filename' };
}
const projectRows = await sql<{ id: string; path: string }[]>`
SELECT id, path FROM projects WHERE id = ${projectId}
`;
if (projectRows.length === 0) {
reply.code(404);
return { error: 'project not found' };
}
const project = projectRows[0]!;
let resolvedRoot: string;
try {
resolvedRoot = await realpath(project.path);
} catch {
reply.code(404);
return { error: 'project path missing' };
}
const artifactsDir = resolve(resolvedRoot, '.boocode/artifacts');
const absPath = resolve(artifactsDir, base);
if (!absPath.startsWith(artifactsDir + sep)) {
reply.code(400);
return { error: 'path traversal rejected' };
}
// Close the symlink-escape gap: if `.boocode/artifacts` (or an
// ancestor) is a symlink pointing outside resolvedRoot, the lexical
// prefix check above passes but the actual read lands outside the
// sandbox. Realpath the artifacts dir and re-verify.
try {
const realArtifactsDir = await realpath(artifactsDir);
if (
realArtifactsDir !== resolvedRoot &&
!realArtifactsDir.startsWith(resolvedRoot + sep)
) {
reply.code(400);
return { error: 'path traversal rejected' };
}
} catch {
reply.code(404);
return { error: 'artifact not found' };
}
try {
await stat(absPath);
} catch {
reply.code(404);
return { error: 'artifact not found' };
}
const ext = base.toLowerCase().endsWith('.html')
? 'text/html; charset=utf-8'
: base.toLowerCase().endsWith('.md')
? 'text/markdown; charset=utf-8'
: 'application/octet-stream';
reply.header('Content-Type', ext);
// Defense-in-depth on LLM-generated HTML served through this route.
// Authelia gates the proxy; these headers limit blast radius if a
// payload tries to escape that boundary in-browser.
reply.header('X-Content-Type-Options', 'nosniff');
reply.header('Content-Security-Policy', 'sandbox');
reply.header(
'Content-Disposition',
`attachment; filename="${base.replace(/"/g, '')}"`,
);
return reply.send(createReadStream(absPath));
},
);
}

View File

@@ -102,7 +102,7 @@ export function registerChatRoutes(
VALUES (${req.params.id}, ${parsed.data.name ?? null}, 'open')
RETURNING id, session_id, name, status, created_at, updated_at
`;
broker.publishUser('default', {
broker.publishUserFrame('default', {
type: 'chat_created',
chat: chat!,
session_id: req.params.id,
@@ -132,7 +132,7 @@ export function registerChatRoutes(
return { error: 'chat not found' };
}
const chat = rows[0]!;
broker.publishUser('default', {
broker.publishUserFrame('default', {
type: 'chat_updated',
chat_id: chat.id,
session_id: chat.session_id,
@@ -162,7 +162,7 @@ export function registerChatRoutes(
`;
const ids = rows.map((r) => r.id);
for (const id of ids) {
broker.publishUser('default', {
broker.publishUserFrame('default', {
type: 'chat_archived',
chat_id: id,
session_id: req.params.id,
@@ -203,7 +203,7 @@ export function registerChatRoutes(
return { error: 'chat not found or already archived' };
}
const row = rows[0]!;
broker.publishUser('default', {
broker.publishUserFrame('default', {
type: 'chat_archived',
chat_id: row.id,
session_id: row.session_id,
@@ -226,7 +226,7 @@ export function registerChatRoutes(
return { error: 'chat not found or not archived' };
}
const chat = rows[0]!;
broker.publishUser('default', { type: 'chat_unarchived', chat });
broker.publishUserFrame('default', { type: 'chat_unarchived', chat });
return chat;
}
);
@@ -243,7 +243,7 @@ export function registerChatRoutes(
return { error: 'chat not found' };
}
const row = result[0]!;
broker.publishUser('default', {
broker.publishUserFrame('default', {
type: 'chat_deleted',
chat_id: row.id,
session_id: row.session_id,
@@ -338,7 +338,7 @@ export function registerChatRoutes(
return chat!;
});
broker.publishUser('default', {
broker.publishUserFrame('default', {
type: 'chat_created',
chat: newChat,
session_id: source.session_id,
@@ -400,13 +400,13 @@ export function registerChatRoutes(
reply.code(409);
return { error: 'message status changed mid-request' };
}
broker.publishUser('default', {
broker.publishUserFrame('default', {
type: 'chat_status',
chat_id: msg.chat_id,
status: 'idle',
at: new Date().toISOString(),
});
broker.publish(msg.session_id, {
broker.publishFrame(msg.session_id, {
type: 'message_complete',
message_id: msg.id,
chat_id: msg.chat_id,

View File

@@ -1,7 +1,13 @@
import type { FastifyInstance } from 'fastify';
import { z } from 'zod';
import type { Sql } from '../db.js';
import type { Config } from '../config.js';
import type { Broker } from '../services/broker.js';
import type { Chat, Message, Session, ToolCall } from '../types/api.js';
// v1.13.17-cross-repo-reads: grant_read_access resolves the grant root at
// decision time (not at request time) so concurrent project changes don't
// stale-bind the resolution.
import { resolveGrantRoot } from '../services/grant_resolver.js';
const SendBody = z.object({
content: z.string().min(1).max(64_000),
@@ -47,6 +53,21 @@ const AskUserInputArgs = z.object({
.max(3),
});
// v1.13.17-cross-repo-reads: grant decision body. tool_call_id is the
// model-emitted id (e.g. "call_abc123"), not a UUID. decision is binary.
const GrantReadAccessBody = z.object({
tool_call_id: z.string().min(1),
decision: z.enum(['allow', 'deny']),
});
// Same shape as services/request_read_access.ts RequestReadAccessInput.
// Re-derived to avoid the services/tools.ts import (matches the
// AskUserInputArgs pattern above).
const RequestReadAccessArgs = z.object({
path: z.string().min(1),
reason: z.string().min(1).max(500),
});
interface MessageHandlers {
enqueueInference: (sessionId: string, chatId: string, assistantMessageId: string, user: string) => void;
// v1.11: returns a promise that resolves after compaction.process finishes
@@ -76,6 +97,8 @@ interface MessageHandlers {
export function registerMessageRoutes(
app: FastifyInstance,
sql: Sql,
config: Config,
broker: Broker,
handlers: MessageHandlers
): void {
app.get<{ Params: { id: string } }>(
@@ -626,4 +649,234 @@ export function registerMessageRoutes(
return result;
},
);
// v1.13.17-cross-repo-reads: resume an awaiting-grant pause. Mirror shape
// of /answer_user_input (validate, look up via message_parts, UPDATE,
// publish, enqueue). Differences vs /answer_user_input:
// - On 'allow', re-resolves the grant root via grant_resolver (state
// may have changed since the prompt fired — concurrent project add,
// etc.). Resolution failure auto-falls to a denial with reason text
// rather than 500ing.
// - On 'allow' with a valid root, appends to sessions.allowed_read_paths
// (deduplicated) inside the same transaction.
// - On success, also publishes session_updated so an open SettingsPane
// refetches the new grant list.
// Error codes match /answer:
// 400 invalid_body / mismatched_answer_shape (bad args on the tool_call)
// 404 chat_not_found / unknown_tool_call_id
// 409 tool_call_already_answered
app.post<{ Params: { id: string } }>(
'/api/chats/:id/grant_read_access',
async (req, reply) => {
const parsed = GrantReadAccessBody.safeParse(req.body);
if (!parsed.success) {
reply.code(400);
return { error: 'invalid_body', details: parsed.error.flatten() };
}
const { tool_call_id, decision } = parsed.data;
const chatRows = await sql<Chat[]>`
SELECT id, session_id FROM chats WHERE id = ${req.params.id} AND status = 'open'
`;
if (chatRows.length === 0) {
reply.code(404);
return { error: 'chat_not_found' };
}
const chat = chatRows[0]!;
const sessionId = chat.session_id;
// Mirror the /answer lookup: assistant tool_call by id via message_parts.
const callerRows = await sql<{
message_id: string;
payload: { id: string; name: string; args: Record<string, unknown> };
}[]>`
SELECT p.message_id, p.payload
FROM message_parts p
JOIN messages m ON m.id = p.message_id
WHERE m.chat_id = ${chat.id}
AND m.role = 'assistant'
AND p.kind = 'tool_call'
AND p.payload->>'id' = ${tool_call_id}
ORDER BY m.created_at DESC
LIMIT 1
`;
const callerRow = callerRows[0];
if (!callerRow) {
reply.code(404);
return { error: 'unknown_tool_call_id' };
}
const foundCall: ToolCall = {
id: callerRow.payload.id,
name: callerRow.payload.name,
args: callerRow.payload.args,
};
if (foundCall.name !== 'request_read_access') {
reply.code(400);
return { error: 'tool_call_not_request_read_access' };
}
const argsParsed = RequestReadAccessArgs.safeParse(foundCall.args);
if (!argsParsed.success) {
reply.code(400);
return { error: 'mismatched_answer_shape', detail: 'tool_call args invalid' };
}
const requestedPath = argsParsed.data.path;
// Find the pending tool row.
const toolRows = await sql<{
message_id: string;
payload: { tool_call_id: string; output: unknown };
}[]>`
SELECT p.message_id, p.payload
FROM message_parts p
JOIN messages m ON m.id = p.message_id
WHERE m.chat_id = ${chat.id}
AND m.role = 'tool'
AND p.kind = 'tool_result'
AND p.payload->>'tool_call_id' = ${tool_call_id}
ORDER BY m.created_at DESC
LIMIT 1
`;
const toolRow = toolRows[0];
if (!toolRow) {
reply.code(404);
return { error: 'unknown_tool_call_id', detail: 'tool message not found' };
}
if (toolRow.payload && toolRow.payload.output !== null) {
reply.code(409);
return { error: 'tool_call_already_answered' };
}
// Look up session + project so we can re-resolve the grant root and
// append to allowed_read_paths atomically. We don't need agent or
// history here — just the project path for the resolver.
const sessionRows = await sql<{
id: string;
project_id: string;
allowed_read_paths: string[];
project_path: string;
}[]>`
SELECT s.id, s.project_id, s.allowed_read_paths, p.path AS project_path
FROM sessions s
JOIN projects p ON p.id = s.project_id
WHERE s.id = ${sessionId}
`;
const sessionRow = sessionRows[0];
if (!sessionRow) {
reply.code(404);
return { error: 'session_not_found' };
}
// Decision branch. 'deny' is the easy path: nothing to resolve or
// persist. 'allow' resolves the grant root; if resolution fails (e.g.
// path was deleted, project removed since prompt) the tool gets a
// denial with the resolver's reason text instead of a 500.
let resultOutput: string;
let grantRoot: string | null = null;
if (decision === 'allow') {
const resolution = await resolveGrantRoot(
sql,
requestedPath,
sessionRow.project_path,
config.PROJECT_ROOT_WHITELIST,
);
if (!resolution.ok) {
resultOutput = `denied: ${resolution.reason}`;
} else {
grantRoot = resolution.root;
resultOutput = `granted: ${grantRoot}`;
}
} else {
resultOutput = 'denied';
}
const newToolResults = {
tool_call_id,
output: resultOutput,
truncated: false,
};
const toolMessageId = toolRow.message_id;
const dbResult = await sql.begin(async (tx) => {
await tx`
UPDATE messages
SET tool_results = ${tx.json(newToolResults as never)}
WHERE id = ${toolMessageId}
`;
// Same delete+insert dance as /answer — UNIQUE (message_id, sequence)
// blocks plain UPDATE on append-style parts.
await tx`DELETE FROM message_parts WHERE message_id = ${toolMessageId} AND kind = 'tool_result'`;
await tx`
INSERT INTO message_parts (message_id, sequence, kind, payload)
VALUES (${toolMessageId}, 0, 'tool_result', ${tx.json(newToolResults as never)})
`;
// Persist the grant if we have one. ARRAY-level dedup — append only
// when the root isn't already present. The session row gets
// touched (updated_at) so the post-update publish below has a
// fresh timestamp.
let allowedRootsAfter = sessionRow.allowed_read_paths;
if (grantRoot !== null) {
if (!sessionRow.allowed_read_paths.includes(grantRoot)) {
const updated = await tx<{ allowed_read_paths: string[] }[]>`
UPDATE sessions
SET allowed_read_paths = array_append(allowed_read_paths, ${grantRoot}),
updated_at = clock_timestamp()
WHERE id = ${sessionId}
RETURNING allowed_read_paths
`;
allowedRootsAfter = updated[0]?.allowed_read_paths ?? sessionRow.allowed_read_paths;
} else {
// Already present — touch updated_at so any open settings
// panel still picks up the no-op via session_updated.
await tx`UPDATE sessions SET updated_at = clock_timestamp() WHERE id = ${sessionId}`;
}
}
const [assistantMsg] = await tx<{ id: string }[]>`
INSERT INTO messages (session_id, chat_id, role, content, status, created_at)
VALUES (${sessionId}, ${chat.id}, 'assistant', '', 'streaming', clock_timestamp())
RETURNING id
`;
await tx`UPDATE chats SET updated_at = clock_timestamp() WHERE id = ${chat.id}`;
return {
tool_message_id: toolMessageId,
assistant_message_id: assistantMsg!.id,
allowed_roots_after: allowedRootsAfter,
};
});
// Publish the deferred tool_result frame so the pending card flips to
// its answered view without a refetch.
handlers.publishSessionFrame(sessionId, {
type: 'tool_result',
tool_message_id: dbResult.tool_message_id,
tool_call_id,
chat_id: chat.id,
output: resultOutput,
truncated: false,
});
// session_updated nudge so any open SettingsPane refetches and sees
// the new allowed_read_paths. We publish on the user channel to match
// the existing PATCH /api/sessions/:id behavior — frontend refetches
// via api.sessions.get on receipt.
const nowIso = new Date().toISOString();
broker.publishUserFrame('default', {
type: 'session_updated',
session_id: sessionId,
project_id: sessionRow.project_id,
// session name doesn't change on grant; we look it up fresh to
// avoid carrying stale state if a rename raced us.
name:
(
await sql<{ name: string }[]>`SELECT name FROM sessions WHERE id = ${sessionId}`
)[0]?.name ?? '',
updated_at: nowIso,
});
handlers.enqueueInference(sessionId, chat.id, dbResult.assistant_message_id, 'default');
reply.code(202);
return {
tool_message_id: dbResult.tool_message_id,
assistant_message_id: dbResult.assistant_message_id,
allowed_read_paths: dbResult.allowed_roots_after,
};
},
);
}

View File

@@ -129,7 +129,7 @@ export function registerProjectRoutes(
RETURNING id, name, path, added_at, last_session_id, status, gitea_remote,
default_system_prompt, default_web_search_enabled
`;
broker.publishUser('default', { type: 'project_created', project: row as unknown as Project });
broker.publishUserFrame('default', { type: 'project_created', project: row as unknown as Project });
reply.code(201);
return {
project: row,
@@ -186,11 +186,11 @@ export function registerProjectRoutes(
`;
if (existing.length === 0) {
broker.publishUser('default', { type: 'project_created', project: row as unknown as Project });
broker.publishUserFrame('default', { type: 'project_created', project: row as unknown as Project });
reply.code(201);
} else {
// existing.status was 'archived' — row has been restored.
broker.publishUser('default', { type: 'project_unarchived', project: row as unknown as Project });
broker.publishUserFrame('default', { type: 'project_unarchived', project: row as unknown as Project });
reply.code(200);
}
return row;
@@ -243,7 +243,7 @@ export function registerProjectRoutes(
// v1.9: the project_updated frame still only carries id + name. Clients
// that need the new fields refetch via api.projects.list() — keeps the
// frame payload lean, per the locked recon decision (d).
broker.publishUser('default', {
broker.publishUserFrame('default', {
type: 'project_updated',
project_id: project.id,
name: project.name,
@@ -260,7 +260,7 @@ export function registerProjectRoutes(
reply.code(404);
return { error: 'not found or already archived' };
}
broker.publishUser('default', { type: 'project_archived', project_id: req.params.id });
broker.publishUserFrame('default', { type: 'project_archived', project_id: req.params.id });
reply.code(204);
return null;
});
@@ -277,7 +277,7 @@ export function registerProjectRoutes(
return { error: 'not found or not archived' };
}
const project = rows[0]!;
broker.publishUser('default', { type: 'project_unarchived', project });
broker.publishUserFrame('default', { type: 'project_unarchived', project });
return project;
});
@@ -288,7 +288,7 @@ export function registerProjectRoutes(
reply.code(404);
return { error: 'not found' };
}
broker.publishUser('default', { type: 'project_deleted', project_id: id });
broker.publishUserFrame('default', { type: 'project_deleted', project_id: id });
reply.code(204);
return null;
});

View File

@@ -13,12 +13,37 @@ const CreateBody = z.object({
agent_id: z.string().min(1).max(200).nullable().optional(),
});
// v1.14.x-html-artifact-panes: 'markdown_artifact' + 'html_artifact' added
// as pane kinds. Pane state is a reference only (chat_id + message_id +
// title) — the actual artifact body is fetched from the message row or
// message_parts.payload by the pane component on mount.
const MarkdownArtifactStateZ = z.object({
chat_id: z.string().min(1).max(200),
message_id: z.string().min(1).max(200),
title: z.string().max(500),
});
const HtmlArtifactStateZ = z.object({
chat_id: z.string().min(1).max(200),
message_id: z.string().min(1).max(200),
title: z.string().max(500),
});
const WorkspacePaneZ = z.object({
id: z.string().min(1).max(200),
kind: z.enum(['chat', 'terminal', 'agent', 'empty', 'settings']),
kind: z.enum([
'chat',
'terminal',
'agent',
'empty',
'settings',
'markdown_artifact',
'html_artifact',
]),
chatId: z.string().min(1).max(200).optional(),
chatIds: z.array(z.string().min(1).max(200)).max(50),
activeChatIdx: z.number().int(),
markdown_artifact_state: MarkdownArtifactStateZ.optional(),
html_artifact_state: HtmlArtifactStateZ.optional(),
});
const WorkspacePanesBody = z.object({
@@ -32,6 +57,29 @@ const PatchBody = z.object({
agent_id: z.string().min(1).max(200).nullable().optional(),
// v1.9: null = inherit from project default; true/false = explicit override.
web_search_enabled: z.boolean().nullable().optional(),
// v1.13.17-cross-repo-reads: revocation pathway. PATCH with a shortened
// list deletes entries; the grant flow itself APPENDS via the separate
// grant_read_access endpoint, never via this PATCH. Frontend treats this
// as "send the new whole array". Per-entry shape validation: must be
// absolute, no NUL, no `/..` traversal segment. Server doesn't re-validate
// whitelist membership on PATCH — entries already in the array were
// placed there by the grant endpoint after a full whitelist+repo-shape
// check. THE SUBSET CHECK (every entry must already be in the current
// array) is enforced at runtime in the PATCH handler below, NOT in this
// zod refinement, because the refinement has no access to the existing
// session row.
allowed_read_paths: z
.array(
z
.string()
.min(1)
.max(1024)
.refine((p) => p.startsWith('/') && !p.includes('\0') && !p.includes('/..'), {
message: 'must be an absolute path without traversal markers',
}),
)
.max(64)
.optional(),
});
async function resolveDefaultModel(sql: Sql, config: Config): Promise<string> {
@@ -40,6 +88,19 @@ async function resolveDefaultModel(sql: Sql, config: Config): Promise<string> {
return config.DEFAULT_MODEL;
}
// v1.13.17-cross-repo-reads: subset enforcement for PATCH allowed_read_paths.
// The PATCH route can only SHRINK the array; growth happens exclusively via
// POST /api/chats/:id/grant_read_access (which requires user consent).
// Returns the list of disallowed-additions; an empty list means the request
// is a valid shrink-or-no-op. Exported for the unit test.
export function findUnauthorizedAdditions(
prior: readonly string[],
requested: readonly string[],
): string[] {
const priorSet = new Set(prior);
return requested.filter((p) => !priorSet.has(p));
}
export function registerSessionRoutes(
app: FastifyInstance,
sql: Sql,
@@ -56,7 +117,7 @@ export function registerSessionRoutes(
}
const status = req.query.status === 'archived' ? 'archived' : 'open';
const rows = await sql<Session[]>`
SELECT id, project_id, name, model, system_prompt, status, created_at, updated_at, agent_id, web_search_enabled, workspace_panes
SELECT id, project_id, name, model, system_prompt, status, created_at, updated_at, agent_id, web_search_enabled, workspace_panes, allowed_read_paths
FROM sessions
WHERE project_id = ${req.params.id} AND status = ${status}
ORDER BY updated_at DESC
@@ -112,7 +173,7 @@ export function registerSessionRoutes(
`;
return session!;
});
broker.publishUser('default', {
broker.publishUserFrame('default', {
type: 'session_created',
session: row,
project_id: row.project_id,
@@ -124,7 +185,7 @@ export function registerSessionRoutes(
app.get<{ Params: { id: string } }>('/api/sessions/:id', async (req, reply) => {
const rows = await sql<Session[]>`
SELECT id, project_id, name, model, system_prompt, status, created_at, updated_at, agent_id, web_search_enabled, workspace_panes
SELECT id, project_id, name, model, system_prompt, status, created_at, updated_at, agent_id, web_search_enabled, workspace_panes, allowed_read_paths
FROM sessions WHERE id = ${req.params.id}
`;
if (rows.length === 0) {
@@ -150,15 +211,53 @@ export function registerSessionRoutes(
const newAgentId = parsed.data.agent_id ?? null;
const wseProvided = parsed.data.web_search_enabled !== undefined;
const newWse = parsed.data.web_search_enabled ?? null;
// Read the prior name so the post-update publish can skip no-op renames
// (PATCH { name: "Foo" } where the session is already "Foo"). The window
// between SELECT and UPDATE is sub-millisecond in the same request handler;
// a concurrent rename in that gap would just mean one stale publish, which
// existing clients dedup by id.
const before = await sql<{ name: string }[]>`
SELECT name FROM sessions WHERE id = ${req.params.id}
// v1.13.17-cross-repo-reads: tri-state on the wire (undefined = no
// change, [] = clear). Frontend currently uses this PATCH only for
// revocation (delete a single entry from the existing array, send
// shortened result). Append-style grants go through the dedicated
// grant_read_access endpoint inside the inference loop.
const arpProvided = parsed.data.allowed_read_paths !== undefined;
const newArp = parsed.data.allowed_read_paths ?? [];
// Read the prior name + grants so the post-update publish can skip no-op
// renames (PATCH { name: "Foo" } where the session is already "Foo") AND
// so the subset check below has the current grant list to compare against.
// The window between SELECT and UPDATE is sub-millisecond in the same
// request handler; a concurrent rename in that gap would just mean one
// stale publish, which existing clients dedup by id.
const before = await sql<{ name: string; allowed_read_paths: string[] }[]>`
SELECT name, allowed_read_paths FROM sessions WHERE id = ${req.params.id}
`;
const priorName = before[0]?.name;
const priorArp = before[0]?.allowed_read_paths ?? [];
// v1.13.17-cross-repo-reads: subset enforcement. The grant flow is the
// ONLY path that can add entries to allowed_read_paths — PATCH can only
// shrink the array, never grow it. Without this guard, a malicious
// client could POST {"allowed_read_paths":["/etc"]} and bypass the
// user-consent prompt entirely. Sam flagged this in the v1.13.17
// compliance review (2026-05-22).
// Race note: a concurrent grant landing between this SELECT and the
// UPDATE below would briefly make a "shouldn't-have-been-valid" PATCH
// succeed (the newly-granted root sneaks in). Inverse race — a
// legitimate revoke happening alongside a concurrent grant — could
// briefly reject the revoke; the user retries. Both are acceptable
// given the single-user threat model + sub-millisecond window.
if (arpProvided) {
const extras = findUnauthorizedAdditions(priorArp, newArp);
if (extras.length > 0) {
reply.code(400);
return {
error: 'invalid body',
details: {
fieldErrors: {
allowed_read_paths: [
`entries must already be granted; cannot add via PATCH: ${extras.join(', ')}`,
],
},
},
};
}
}
const rows = await sql<Session[]>`
UPDATE sessions
SET
@@ -167,10 +266,11 @@ export function registerSessionRoutes(
system_prompt = COALESCE(${system_prompt ?? null}, system_prompt),
agent_id = CASE WHEN ${agentIdProvided} THEN ${newAgentId} ELSE agent_id END,
web_search_enabled = CASE WHEN ${wseProvided} THEN ${newWse} ELSE web_search_enabled END,
allowed_read_paths = CASE WHEN ${arpProvided} THEN ${sql.array(newArp, 25)} ELSE allowed_read_paths END,
updated_at = clock_timestamp()
WHERE id = ${req.params.id}
RETURNING id, project_id, name, model, system_prompt, status, created_at, updated_at,
agent_id, web_search_enabled, workspace_panes
agent_id, web_search_enabled, workspace_panes, allowed_read_paths
`;
if (rows.length === 0) {
reply.code(404);
@@ -178,7 +278,7 @@ export function registerSessionRoutes(
}
const session = rows[0]!;
if (name !== undefined && session.name !== priorName) {
broker.publishUser('default', {
broker.publishUserFrame('default', {
type: 'session_renamed',
session_id: session.id,
name: session.name,
@@ -188,7 +288,7 @@ export function registerSessionRoutes(
// (notably the SettingsPane open in another tab) can refetch and pick
// up the new fields. Frame stays lean (decision d) — payload is just
// ids + name + updated_at, the client refetches via api.sessions.get.
broker.publishUser('default', {
broker.publishUserFrame('default', {
type: 'session_updated',
session_id: session.id,
project_id: session.project_id,
@@ -213,14 +313,14 @@ export function registerSessionRoutes(
updated_at = clock_timestamp()
WHERE id = ${req.params.id}
RETURNING id, project_id, name, model, system_prompt, status, created_at, updated_at,
agent_id, web_search_enabled, workspace_panes
agent_id, web_search_enabled, workspace_panes, allowed_read_paths
`;
if (rows.length === 0) {
reply.code(404);
return { error: 'session not found' };
}
const session = rows[0]!;
broker.publishUser('default', {
broker.publishUserFrame('default', {
type: 'session_workspace_updated',
session_id: session.id,
workspace_panes: session.workspace_panes,
@@ -248,7 +348,7 @@ export function registerSessionRoutes(
`;
const ids = rows.map((r) => r.id);
for (const id of ids) {
broker.publishUser('default', {
broker.publishUserFrame('default', {
type: 'session_archived',
session_id: id,
project_id: req.params.id,
@@ -289,7 +389,7 @@ export function registerSessionRoutes(
reply.code(404);
return { error: 'session not found or already archived' };
}
broker.publishUser('default', {
broker.publishUserFrame('default', {
type: 'session_archived',
session_id: rows[0]!.id,
project_id: rows[0]!.project_id,
@@ -312,7 +412,7 @@ export function registerSessionRoutes(
return { error: 'session not found or not archived' };
}
const session = rows[0]!;
broker.publishUser('default', {
broker.publishUserFrame('default', {
type: 'session_created',
session: session,
project_id: session.project_id,
@@ -334,7 +434,7 @@ export function registerSessionRoutes(
return { error: 'not found' };
}
const project_id = deleted[0]!.project_id;
broker.publishUser('default', { type: 'session_deleted', session_id: id, project_id });
broker.publishUserFrame('default', { type: 'session_deleted', session_id: id, project_id });
reply.code(204);
return null;
}

View File

@@ -0,0 +1,40 @@
import type { FastifyInstance } from 'fastify';
import type { Sql } from '../db.js';
export interface ToolCostStat {
tool_name: string;
mean_prompt_tokens: number;
mean_completion_tokens: number;
n_calls: number;
updated_at: string;
}
// v1.13.10: per-tool token cost rolling window read endpoint. Backed by the
// tool_cost_stats view in schema.sql (last 100 calls per tool, equal-split
// attribution across multi-tool turns, sentinel/failed-turn excluded).
// Consumed by AgentPicker for at-a-glance per-agent cost hints.
export function registerToolsRoutes(app: FastifyInstance, sql: Sql): void {
app.get('/api/tools/cost_stats', async () => {
const rows = await sql<
{
tool_name: string;
prompt_tokens_sum: number;
completion_tokens_sum: number;
n_calls: number;
updated_at: string;
}[]
>`
SELECT tool_name, prompt_tokens_sum, completion_tokens_sum, n_calls, updated_at
FROM tool_cost_stats
ORDER BY tool_name ASC
`;
const stats: ToolCostStat[] = rows.map((r) => ({
tool_name: r.tool_name,
mean_prompt_tokens: Math.round(r.prompt_tokens_sum / r.n_calls),
mean_completion_tokens: Math.round(r.completion_tokens_sum / r.n_calls),
n_calls: r.n_calls,
updated_at: r.updated_at,
}));
return { stats };
});
}

View File

@@ -51,11 +51,50 @@ CREATE TABLE IF NOT EXISTS message_parts (
kind text NOT NULL,
payload jsonb NOT NULL,
created_at timestamptz NOT NULL DEFAULT clock_timestamp(),
CONSTRAINT message_parts_kind_chk CHECK (kind IN ('text', 'tool_call', 'tool_result', 'reasoning', 'step_start')),
CONSTRAINT message_parts_kind_chk CHECK (kind IN ('text', 'tool_call', 'tool_result', 'reasoning', 'step_start', 'synthesis', 'html_artifact')),
CONSTRAINT message_parts_seq_uniq UNIQUE (message_id, sequence)
);
CREATE INDEX IF NOT EXISTS message_parts_msg_seq_idx ON message_parts (message_id, sequence);
-- v1.13.4: prune support. hidden_at marks parts that have been pruned out
-- of the model payload by the two-tier compaction prune (services/inference/
-- prune.ts). Rows stay in the DB so frontend can still display them with a
-- "hidden" indicator (out of scope this dispatch). messages_with_parts
-- view filters these out — see below. Partial index speeds the common
-- "visible parts only" filter.
DO $$
BEGIN
IF NOT EXISTS (
SELECT 1 FROM information_schema.columns
WHERE table_name = 'message_parts' AND column_name = 'hidden_at'
) THEN
ALTER TABLE message_parts ADD COLUMN hidden_at timestamptz NULL;
END IF;
END $$;
CREATE INDEX IF NOT EXISTS message_parts_hidden_idx
ON message_parts (message_id) WHERE hidden_at IS NULL;
-- v1.13.13: extend message_parts.kind to allow 'synthesis'. Existing DBs were
-- created with the pre-v1.13.13 CHECK constraint that did NOT include
-- 'synthesis'; drop + re-add the constraint with the extended enum. Fresh
-- installs hit the inline constraint above (already updated) and skip this
-- block via the pg_constraint guard.
-- v1.14.x-html-artifact-panes: extend the same constraint with 'html_artifact'.
-- DROP IF EXISTS + DO $$ pg_constraint $$ guard remains idempotent across
-- both v1.13.13 and v1.14.x boots; the IN list below is the union of every
-- kind ever shipped.
ALTER TABLE message_parts DROP CONSTRAINT IF EXISTS message_parts_kind_chk;
DO $$
BEGIN
IF NOT EXISTS (
SELECT 1 FROM pg_constraint WHERE conname = 'message_parts_kind_chk'
) THEN
ALTER TABLE message_parts
ADD CONSTRAINT message_parts_kind_chk
CHECK (kind IN ('text', 'tool_call', 'tool_result', 'reasoning', 'step_start', 'synthesis', 'html_artifact'));
END IF;
END $$;
-- v1.13.1-B: read-path view. Read sites SELECT FROM messages_with_parts
-- instead of messages so tool_calls / tool_results / reasoning_parts come
-- from the granular message_parts table. The COALESCE means pre-v1.13.0
@@ -73,25 +112,96 @@ SELECT
m.last_seq, m.tokens_used, m.ctx_used, m.ctx_max,
m.started_at, m.finished_at, m.created_at, m.metadata,
m.summary, m.tail_start_id, m.compacted_at,
COALESCE(
(SELECT jsonb_agg(p.payload ORDER BY p.sequence)
FROM message_parts p
WHERE p.message_id = m.id AND p.kind = 'tool_call'),
m.tool_calls
) AS tool_calls,
COALESCE(
(SELECT p.payload
FROM message_parts p
WHERE p.message_id = m.id AND p.kind = 'tool_result'
ORDER BY p.sequence
LIMIT 1),
m.tool_results
) AS tool_results,
-- v1.13.4: prune semantics need to distinguish "no parts row exists"
-- (pre-v1.13.0 fallback to legacy column) from "all parts hidden"
-- (prune intended — return null/empty so the row drops from the model
-- payload). A naive COALESCE would fall back to the legacy column when
-- every part is hidden, undoing the prune. CASE on EXISTS(any kind)
-- splits the two cases.
CASE
WHEN EXISTS (SELECT 1 FROM message_parts pp
WHERE pp.message_id = m.id AND pp.kind = 'tool_call')
THEN (SELECT jsonb_agg(p.payload ORDER BY p.sequence)
FROM message_parts p
WHERE p.message_id = m.id AND p.kind = 'tool_call' AND p.hidden_at IS NULL)
ELSE m.tool_calls
END AS tool_calls,
CASE
WHEN EXISTS (SELECT 1 FROM message_parts pp
WHERE pp.message_id = m.id AND pp.kind = 'tool_result')
THEN (SELECT p.payload
FROM message_parts p
WHERE p.message_id = m.id AND p.kind = 'tool_result' AND p.hidden_at IS NULL
ORDER BY p.sequence LIMIT 1)
ELSE m.tool_results
END AS tool_results,
(SELECT jsonb_agg(p.payload ORDER BY p.sequence)
FROM message_parts p
WHERE p.message_id = m.id AND p.kind = 'reasoning') AS reasoning_parts
WHERE p.message_id = m.id AND p.kind = 'reasoning' AND p.hidden_at IS NULL) AS reasoning_parts
FROM messages m;
-- v1.13.10: per-tool token cost rolling window. Derives from
-- messages_with_parts (the v1.13.1-B view that COALESCEs message_parts over
-- the legacy JSON column) so this works whether the chat predates v1.13.0
-- or postdates v1.13.2 (column drop). No new write site — all source data
-- already lands via the existing tool-phase.ts:94-95 UPDATE.
--
-- Attribution model: equal split. A turn emitting N tool calls divides its
-- prompt/completion tokens by N before attribution. See v1.13.10 dispatch
-- brief for rationale + rejected alternatives.
--
-- Column mapping: messages.ctx_used = prompt (input), messages.tokens_used
-- = completion (output). Non-obvious naming; pinned via canonical writes at
-- tool-phase.ts:94-95 et al.
--
-- Filtering rationale:
-- status='complete' — exclude failed/cancelled (defense in
-- depth; failed-path doesn't write
-- tokens_used so they're filtered
-- indirectly too).
-- metadata->>'kind' exclusions — exclude cap_hit / doom_loop sentinels
-- (defense in depth; sentinels are
-- role='system' with tool_calls=NULL
-- so they're filtered indirectly too).
-- experimental_repairToolCall — no special handling; retries flow
-- as normal next-turn tool_result
-- errors and count naturally.
--
-- Rolling window: last 100 calls per tool_name, ordered by created_at DESC.
-- Aggregate-on-read is microseconds at BooCode scale (single user, ~30
-- tools, < 100 calls each). DROP VIEW + recreate to change window size.
CREATE OR REPLACE VIEW tool_cost_stats AS
WITH per_call AS (
SELECT
(tc->>'name')::text AS tool_name,
(m.ctx_used::float / NULLIF(jsonb_array_length(m.tool_calls), 0)) AS prompt_tokens,
(m.tokens_used::float / NULLIF(jsonb_array_length(m.tool_calls), 0)) AS completion_tokens,
m.created_at,
ROW_NUMBER() OVER (
PARTITION BY (tc->>'name')::text
ORDER BY m.created_at DESC
) AS rn
FROM messages_with_parts m,
LATERAL jsonb_array_elements(m.tool_calls) AS tc
WHERE m.tool_calls IS NOT NULL
AND jsonb_array_length(m.tool_calls) > 0
AND m.tokens_used IS NOT NULL
AND m.ctx_used IS NOT NULL
AND m.status = 'complete'
AND (m.metadata IS NULL
OR m.metadata->>'kind' IS NULL
OR m.metadata->>'kind' NOT IN ('cap_hit', 'doom_loop'))
)
SELECT
tool_name,
ROUND(SUM(prompt_tokens))::int AS prompt_tokens_sum,
ROUND(SUM(completion_tokens))::int AS completion_tokens_sum,
COUNT(*)::int AS n_calls,
MAX(created_at) AS updated_at
FROM per_call
WHERE rn <= 100
GROUP BY tool_name;
ALTER TABLE messages ADD COLUMN IF NOT EXISTS tokens_used INTEGER;
ALTER TABLE messages ADD COLUMN IF NOT EXISTS ctx_used INTEGER;
ALTER TABLE messages ADD COLUMN IF NOT EXISTS ctx_max INTEGER;
@@ -224,6 +334,16 @@ END $$;
-- agent_id is the slugified agent name. NULL means "use BooCode defaults".
ALTER TABLE sessions ADD COLUMN IF NOT EXISTS agent_id TEXT;
-- v1.13.17-cross-repo-reads: session-scoped read grants for paths outside the
-- session's primary project root. Populated only by the request_read_access
-- tool's approve branch; revoked via PATCH /api/sessions/:id. Values are
-- absolute paths to project roots OR repo-shaped dirs under
-- PROJECT_ROOT_WHITELIST (default /opt). No CHECK constraint — validation
-- happens at write time in services/grant_resolver.ts. Cleared automatically
-- when the session row is deleted (no cascade needed; the column goes with it).
ALTER TABLE sessions
ADD COLUMN IF NOT EXISTS allowed_read_paths TEXT[] NOT NULL DEFAULT ARRAY[]::TEXT[];
-- v1.8.2: per-message metadata for sentinels (cap-hit) and structured error
-- reasons. JSONB so future kinds can extend without further schema churn.
-- Shape for cap_hit: { kind: 'cap_hit', used: number, limit: number,

View File

@@ -0,0 +1,261 @@
import { mkdtemp, mkdir, readFile, rm, symlink } from 'node:fs/promises';
import { tmpdir } from 'node:os';
import { join } from 'node:path';
import { afterEach, beforeEach, describe, expect, it } from 'vitest';
import {
decideHtmlArtifactWrite,
deriveHtmlSlug,
deriveHtmlTitle,
deriveMarkdownSlug,
detectHtmlArtifact,
HTML_ARTIFACT_MAX_BYTES,
writeHtmlArtifact,
writeMarkdownArtifact,
} from '../artifacts.js';
import { PathScopeError } from '../path_guard.js';
describe('deriveMarkdownSlug', () => {
it('uses the first # heading when present', () => {
expect(deriveMarkdownSlug('# Hello World\n\nbody')).toBe('hello-world');
});
it('falls back to first 6 words', () => {
const s = deriveMarkdownSlug('the quick brown fox jumps over the lazy dog');
expect(s).toBe('the-quick-brown-fox-jumps-over');
});
it('returns "artifact" for empty input', () => {
expect(deriveMarkdownSlug('')).toBe('artifact');
});
it('caps at 60 chars and lowercases', () => {
const long = '# ' + 'A'.repeat(200);
const s = deriveMarkdownSlug(long);
expect(s.length).toBeLessThanOrEqual(60);
expect(s).toMatch(/^[a-z0-9-]+$/);
});
it('strips trailing punctuation', () => {
expect(deriveMarkdownSlug('# Hello, World!!!')).toBe('hello-world');
});
});
describe('deriveHtmlSlug', () => {
it('prefers payload.title when set', () => {
expect(
deriveHtmlSlug({ html_content: '<html></html>', title: 'My Title' }),
).toBe('my-title');
});
it('falls back to <title> tag', () => {
expect(
deriveHtmlSlug({
html_content: '<html><head><title>Page Title</title></head></html>',
title: null,
}),
).toBe('page-title');
});
it('falls back to first <h1> when no <title>', () => {
expect(
deriveHtmlSlug({
html_content: '<html><body><h1>Heading One</h1></body></html>',
title: null,
}),
).toBe('heading-one');
});
it('falls back to inner text words', () => {
expect(
deriveHtmlSlug({
html_content: '<div>one two three four five six seven</div>',
title: null,
}),
).toBe('one-two-three-four-five-six');
});
});
describe('deriveHtmlTitle', () => {
it('returns <title> content', () => {
expect(deriveHtmlTitle('<html><head><title>T</title></head></html>')).toBe('T');
});
it('falls back to <h1>', () => {
expect(deriveHtmlTitle('<body><h1>H</h1></body>')).toBe('H');
});
it('falls back to first 80 chars of inner text', () => {
const html = '<div>' + 'x '.repeat(100) + '</div>';
const t = deriveHtmlTitle(html);
expect(t).not.toBeNull();
expect(t!.length).toBeLessThanOrEqual(80);
});
it('returns null for empty html', () => {
expect(deriveHtmlTitle('')).toBeNull();
});
});
describe('detectHtmlArtifact', () => {
it('detects <!DOCTYPE html> prefix case-insensitively', () => {
const html = '<!doctype HTML><html><body>x</body></html>';
expect(detectHtmlArtifact(html)).toBe(html);
});
it('strips leading/trailing whitespace before matching', () => {
const html = '\n\n<!DOCTYPE html>\n<html></html>\n';
expect(detectHtmlArtifact(html)).toBe(html.trim());
});
it('detects fenced ```html block wrapping entire message', () => {
const wrapped = '```html\n<!DOCTYPE html>\n<html></html>\n```';
expect(detectHtmlArtifact(wrapped)).toContain('<!DOCTYPE html>');
});
it('rejects plain markdown', () => {
expect(detectHtmlArtifact('# heading\n\nsome text')).toBeNull();
});
it('rejects message with prose before the doctype', () => {
expect(
detectHtmlArtifact('Here you go: <!DOCTYPE html><html></html>'),
).toBeNull();
});
it('rejects empty input', () => {
expect(detectHtmlArtifact('')).toBeNull();
expect(detectHtmlArtifact(' \n ')).toBeNull();
});
it('rejects fenced block without doctype/<html>', () => {
expect(detectHtmlArtifact('```html\n<div>x</div>\n```')).toBeNull();
});
it('accepts fenced block containing <html> tag (no doctype)', () => {
const r = detectHtmlArtifact('```html\n<html><body>x</body></html>\n```');
expect(r).toContain('<html>');
});
});
describe('writeMarkdownArtifact / writeHtmlArtifact', () => {
let projectRoot: string;
beforeEach(async () => {
projectRoot = await mkdtemp(join(tmpdir(), 'artifacts-test-'));
});
afterEach(async () => {
await rm(projectRoot, { recursive: true, force: true });
});
it('writes a markdown artifact under .boocode/artifacts/', async () => {
const result = await writeMarkdownArtifact(
{ content: '# Hello\n\nbody' },
{ projectId: 'pid', projectRoot },
);
expect(result.path).toMatch(/\.boocode\/artifacts\/hello-\d+\.md$/);
expect(result.url).toMatch(/^\/api\/projects\/pid\/artifacts\/hello-\d+\.md$/);
const written = await readFile(result.path, 'utf8');
expect(written).toBe('# Hello\n\nbody');
});
it('writes an html artifact', async () => {
const result = await writeHtmlArtifact(
{
html_content: '<!DOCTYPE html><html><head><title>X</title></head></html>',
char_count: 56,
title: 'X',
},
{ projectId: 'pid', projectRoot },
);
expect(result.path).toMatch(/\.boocode\/artifacts\/x-\d+\.html$/);
const written = await readFile(result.path, 'utf8');
expect(written).toContain('<!DOCTYPE html>');
});
it('creates the artifacts directory if absent', async () => {
// Confirm the writer mkdir-recursive's the artifacts dir on first call.
const result = await writeMarkdownArtifact(
{ content: '# T' },
{ projectId: 'pid', projectRoot },
);
expect(result.path).toContain('.boocode/artifacts');
});
});
describe('1MB cap behavior', () => {
it('reports the correct byte threshold', () => {
expect(HTML_ARTIFACT_MAX_BYTES).toBe(1_048_576);
});
it('exceeds threshold for oversize payload', () => {
const oversize = '<!DOCTYPE html>' + 'A'.repeat(HTML_ARTIFACT_MAX_BYTES);
expect(Buffer.byteLength(oversize, 'utf8')).toBeGreaterThan(
HTML_ARTIFACT_MAX_BYTES,
);
});
it('detectHtmlArtifact still returns content above the cap (cap is checked by caller)', () => {
// Detection is content-shape; the cap check lives in finalizeCompletion
// (error-handler.ts). This test pins that contract: the helper does not
// silently drop oversize payloads on the floor.
const big = '<!DOCTYPE html>' + 'x'.repeat(2_000_000);
expect(detectHtmlArtifact(big)).not.toBeNull();
});
});
describe('decideHtmlArtifactWrite', () => {
// Pure helper extracted from finalizeCompletion's cap-skip branch. Pins
// the warn-and-skip decision without mocking the full InferenceContext.
it('returns write=true for payloads under the cap', () => {
const html = '<!DOCTYPE html><html></html>';
const decision = decideHtmlArtifactWrite(html);
expect(decision.write).toBe(true);
expect(decision.byteLen).toBe(Buffer.byteLength(html, 'utf8'));
});
it('returns write=false with cap_exceeded reason for oversize payloads', () => {
const big = '<!DOCTYPE html>' + 'x'.repeat(HTML_ARTIFACT_MAX_BYTES);
const decision = decideHtmlArtifactWrite(big);
expect(decision.write).toBe(false);
if (!decision.write) {
expect(decision.reason).toBe('cap_exceeded');
expect(decision.byteLen).toBeGreaterThan(HTML_ARTIFACT_MAX_BYTES);
}
});
it('accepts payload exactly at the cap (boundary)', () => {
// byteLen === cap should write; only strictly greater skips.
const exact = 'x'.repeat(HTML_ARTIFACT_MAX_BYTES);
const decision = decideHtmlArtifactWrite(exact);
expect(decision.write).toBe(true);
expect(decision.byteLen).toBe(HTML_ARTIFACT_MAX_BYTES);
});
});
describe('symlink escape protection', () => {
// Closes the gap where `.boocode/artifacts` is a symlink pointing
// outside the project root. The lexical prefix check on the resolved
// candidate path passes (it's under projectRoot textually), but the
// post-mkdir realpath verification must catch the escape.
let projectRoot: string;
let outside: string;
beforeEach(async () => {
projectRoot = await mkdtemp(join(tmpdir(), 'artifacts-symlink-root-'));
outside = await mkdtemp(join(tmpdir(), 'artifacts-symlink-outside-'));
});
afterEach(async () => {
await rm(projectRoot, { recursive: true, force: true });
await rm(outside, { recursive: true, force: true });
});
it('throws PathScopeError when .boocode/artifacts is a symlink to outside the project', async () => {
// Create .boocode dir, then make `artifacts` a symlink pointing outside.
await mkdir(join(projectRoot, '.boocode'), { recursive: true });
await symlink(outside, join(projectRoot, '.boocode', 'artifacts'));
await expect(
writeMarkdownArtifact(
{ content: '# Hello' },
{ projectId: 'pid', projectRoot },
),
).rejects.toBeInstanceOf(PathScopeError);
});
});

View File

@@ -1,5 +1,5 @@
import { afterEach, beforeEach, describe, expect, it, vi } from 'vitest';
import { mkdir, mkdtemp, rm } from 'node:fs/promises';
import { mkdir, mkdtemp, rm, symlink, writeFile } from 'node:fs/promises';
import { join } from 'node:path';
import { tmpdir } from 'node:os';
import { callCodecontext } from '../codecontext_client.js';
@@ -203,3 +203,197 @@ describe('callCodecontext — error paths', () => {
).rejects.toThrow(/timed out after 30000ms/);
});
});
// ---- v1.13.18: file_path resolution tests -----------------------------------
describe('callCodecontext — file_path resolution', () => {
// Case 1: relative path resolves to absolute under project root
it('resolves a relative file_path to an absolute path inside project root', async () => {
// Create a real file so realpath can canonicalise it
const fileName = 'src_module.ts';
await writeFile(join(projectDir, fileName), '// hello');
const fetcher = vi.fn().mockResolvedValue(
mockJSONResponse({ result: 'file analysis', error: null }),
);
await callCodecontext(
{
toolName: 'get_file_analysis',
args: { file_path: fileName },
projectPath: projectDir,
},
fetcher as unknown as typeof fetch,
);
expect(fetcher).toHaveBeenCalledTimes(1);
const body = JSON.parse(fetcher.mock.calls[0]![1]!.body as string);
// Should be the resolved absolute path
expect(body.file_path).toBe(join(projectDir, fileName));
});
// Case 2: absolute path inside project root → realpathed → forwarded
it('passes through an absolute file_path inside project root', async () => {
const fileName = 'absolute_target.ts';
const absPath = join(projectDir, fileName);
await writeFile(absPath, '// absolute');
const fetcher = vi.fn().mockResolvedValue(
mockJSONResponse({ result: 'analysis', error: null }),
);
await callCodecontext(
{
toolName: 'get_file_analysis',
args: { file_path: absPath },
projectPath: projectDir,
},
fetcher as unknown as typeof fetch,
);
const body = JSON.parse(fetcher.mock.calls[0]![1]!.body as string);
expect(body.file_path).toBe(absPath);
});
// Case 3: relative escape path → rejected with same error shape as target_dir escape
it('rejects a relative file_path that escapes the project root', async () => {
const fetcher = vi.fn();
await expect(
callCodecontext(
{
toolName: 'get_file_analysis',
args: { file_path: '../../etc/passwd' },
projectPath: projectDir,
},
fetcher as unknown as typeof fetch,
),
).rejects.toThrow(/escapes project root/);
expect(fetcher).not.toHaveBeenCalled();
});
// Case 4: absolute path outside project root → rejected
it('rejects an absolute file_path outside the project root', async () => {
const fetcher = vi.fn();
await expect(
callCodecontext(
{
toolName: 'get_file_analysis',
// /etc/passwd is outside any tmpdir project root
args: { file_path: '/etc/passwd' },
projectPath: projectDir,
},
fetcher as unknown as typeof fetch,
),
).rejects.toThrow(/escapes project root/);
expect(fetcher).not.toHaveBeenCalled();
});
// Case 5: nonexistent file (ENOENT) → forwarded as un-realpath'd absolute
it('forwards a nonexistent file_path as absolute without throwing', async () => {
const missingPath = join(projectDir, 'does_not_exist.ts');
const fetcher = vi.fn().mockResolvedValue(
mockJSONResponse({ result: null, error: 'File not found in graph: ' + missingPath }),
);
// The resolver should NOT throw; the error comes back from the sidecar
await expect(
callCodecontext(
{
toolName: 'get_file_analysis',
args: { file_path: 'does_not_exist.ts' },
projectPath: projectDir,
},
fetcher as unknown as typeof fetch,
),
).rejects.toThrow(/File not found in graph/);
// Wire was still called — resolver forwarded the path
expect(fetcher).toHaveBeenCalledTimes(1);
const body = JSON.parse(fetcher.mock.calls[0]![1]!.body as string);
// Should receive the absolute (non-realpathed) path
expect(body.file_path).toBe(missingPath);
});
// Case 6: empty string → skipped by guard, reaches wire unmodified
// Note: Zod .trim().min(1) in get_file_analysis rejects empty before the
// shim is reached in production. At the shim layer, the guard
// `file_path.trim() !== ''` skips the resolver for empty strings so that
// optional-file_path wrappers treat '' as "not provided". This is a
// deliberate design; callers that require file_path validate at the Zod layer.
it('skips resolver for empty string file_path (treated as not provided)', async () => {
const fetcher = vi.fn().mockResolvedValue(
mockJSONResponse({ result: 'analysis', error: null }),
);
// Should succeed — empty string is treated as "no file_path"
await callCodecontext(
{
toolName: 'get_file_analysis',
args: { file_path: '' },
projectPath: projectDir,
},
fetcher as unknown as typeof fetch,
);
expect(fetcher).toHaveBeenCalledTimes(1);
const body = JSON.parse(fetcher.mock.calls[0]![1]!.body as string);
// Empty string passes through unchanged (resolver not invoked)
expect(body.file_path).toBe('');
});
// Case 7: wrapper without file_path (e.g. get_codebase_overview) → resolver not invoked
it('does not invoke file_path resolver when file_path is absent from args', async () => {
const fetcher = vi.fn().mockResolvedValue(
mockJSONResponse({ result: 'overview', error: null }),
);
await callCodecontext(
{
toolName: 'get_codebase_overview',
args: { include_stats: true },
projectPath: projectDir,
},
fetcher as unknown as typeof fetch,
);
expect(fetcher).toHaveBeenCalledTimes(1);
const body = JSON.parse(fetcher.mock.calls[0]![1]!.body as string);
// No file_path in the wire body
expect('file_path' in body).toBe(false);
});
// Case 8: absolute path with `..` that resolves outside project root, even
// when the literal path is ENOENT. Without resolve() in the absolute branch
// the prefix check false-positives because the raw `<projectDir>/../etc/x`
// literal starts with `<projectDir>/`.
it('rejects absolute file_path with `..` resolving outside project root (ENOENT branch)', async () => {
const fetcher = vi.fn();
const escapingAbsolute = `${projectDir}/../etc/non_existent_passwd`;
await expect(
callCodecontext(
{
toolName: 'get_file_analysis',
args: { file_path: escapingAbsolute },
projectPath: projectDir,
},
fetcher as unknown as typeof fetch,
),
).rejects.toThrow(/escapes project root/);
expect(fetcher).not.toHaveBeenCalled();
});
// Case 9: in-project symlink targeting outside the project root. This is the
// canonical realpath defense — realpath must canonicalise the symlink and
// the escape check must reject. Without this test, a symlink-out hole could
// regress silently.
it('rejects file_path that resolves through a symlink leaving project root', async () => {
const outsideDir = await mkdtemp(join(tmpdir(), 'codecontext-outside-'));
try {
const evilTarget = join(outsideDir, 'secrets.txt');
await writeFile(evilTarget, 'top secret');
await symlink(evilTarget, join(projectDir, 'evil-link'));
const fetcher = vi.fn();
await expect(
callCodecontext(
{
toolName: 'get_file_analysis',
args: { file_path: 'evil-link' },
projectPath: projectDir,
},
fetcher as unknown as typeof fetch,
),
).rejects.toThrow(/escapes project root/);
expect(fetcher).not.toHaveBeenCalled();
} finally {
await rm(outsideDir, { recursive: true, force: true });
}
});
});

View File

@@ -70,7 +70,7 @@ describe('codecontext wrappers — toolName + args forwarding', () => {
const { url, body } = parsePOST(fetcher);
expect(url).toMatch(/\/v1\/get_file_analysis$/);
expect(body).toMatchObject({
file_path: 'apps/server/src/index.ts',
file_path: join(projectDir, 'apps/server/src/index.ts'),
target_dir: projectDir,
});
});

View File

@@ -6,6 +6,7 @@ import {
turns,
select,
buildPrompt,
buildHeadPayload,
type CompactionMessage,
} from '../compaction.js';
import { SUMMARY_TEMPLATE } from '../compaction-prompt.js';
@@ -31,6 +32,7 @@ function mkMsg(
status: 'complete',
tool_calls: null,
tool_results: null,
reasoning_parts: null,
metadata: null,
created_at: new Date(counter * 1000).toISOString(),
...overrides,
@@ -39,49 +41,58 @@ function mkMsg(
// ---- usable -----------------------------------------------------------------
describe('usable', () => {
it('returns 0 when contextLimit is 0', () => {
// v1.13.9: ratio-only early trigger at 0.85 × contextLimit. Replaces the
// v1.11.0-era `contextLimit - 20_000` math, which degenerated to 0 for
// contexts ≤20k and gave only 7-8% headroom at 262k.
describe('usable() — ratio-only early trigger (v1.13.9)', () => {
it('returns floor(0.85 * limit) for the qwen3.6 daily-driver context', () => {
// floor(0.85 * 262144) = floor(222822.4) = 222822 — 15% headroom for
// the summarizer to do its turn without itself overflowing.
expect(usable(262144)).toBe(222822);
});
it('returns 0.85× for a mid-sized context', () => {
expect(usable(100_000)).toBe(85_000);
});
it('returns 0.85× for a small context (no degenerate 0)', () => {
// floor(0.85 * 8192) = 6963. Under the old formula this returned 0
// (8192 - 20_000 clamped to 0), effectively disabling compaction for
// small-context models. The ratio keeps the trigger active.
expect(usable(8192)).toBe(6963);
});
it('returns 0 for zero or negative contextLimit', () => {
expect(usable(0)).toBe(0);
});
it('returns 0 when contextLimit is below the 20k buffer', () => {
// Math.max(0, x - 20000) clamps the subtraction so we never report
// negative headroom. A 10k-context model reports 0 usable, which makes
// isOverflow short-circuit to false (correct — we can't size the
// compaction with no headroom).
expect(usable(10_000)).toBe(0);
expect(usable(19_999)).toBe(0);
expect(usable(20_000)).toBe(0);
});
it('subtracts the 20k buffer from a normal-sized context window', () => {
expect(usable(100_000)).toBe(80_000);
expect(usable(32_768)).toBe(12_768);
expect(usable(-1)).toBe(0);
});
});
// ---- isOverflow -------------------------------------------------------------
describe('isOverflow', () => {
it('returns false when usable is 0 (unknown / sub-buffer context)', () => {
it('returns false when usable is 0 (unknown contextLimit)', () => {
expect(isOverflow({ prompt_tokens: 999_999, completion_tokens: 0 }, 0)).toBe(false);
expect(isOverflow({ prompt_tokens: 0, completion_tokens: 999_999 }, 10_000)).toBe(false);
expect(isOverflow({ prompt_tokens: 0, completion_tokens: 999_999 }, -1)).toBe(false);
});
it('returns false at 50% of usable', () => {
// usable(100k) = 80k → 50% = 40k.
// v1.13.9: usable(100k) = 85k → 50% 42.5k.
expect(isOverflow({ prompt_tokens: 30_000, completion_tokens: 10_000 }, 100_000)).toBe(false);
});
it('returns false just under usable', () => {
expect(isOverflow({ prompt_tokens: 79_000, completion_tokens: 999 }, 100_000)).toBe(false);
// v1.13.9: 84_000 + 999 = 84_999 < 85_000 budget.
expect(isOverflow({ prompt_tokens: 84_000, completion_tokens: 999 }, 100_000)).toBe(false);
});
it('returns true exactly at usable (>=, not strict >)', () => {
expect(isOverflow({ prompt_tokens: 80_000, completion_tokens: 0 }, 100_000)).toBe(true);
// v1.13.9: 85_000 == usable(100_000).
expect(isOverflow({ prompt_tokens: 85_000, completion_tokens: 0 }, 100_000)).toBe(true);
});
it('returns true above usable', () => {
// 50_000 + 40_000 = 90_000 > 85_000.
expect(isOverflow({ prompt_tokens: 50_000, completion_tokens: 40_000 }, 100_000)).toBe(true);
});
});
@@ -224,8 +235,9 @@ describe('select', () => {
const u = mkMsg('user', 'oversized');
const a = mkMsg('assistant', 'Y'.repeat(40_000));
const result = select([u, a], 30_000, 1);
// usable(30k) = 10k → budget = min(8k, max(2k, floor(10k*0.25))) =
// min(8k, max(2k, 2500)) = 2500. 40k chars ≈ 10k tokens. Can't fit.
// v1.13.9: usable(30k) = floor(0.85*30k) = 25500 → budget =
// min(8k, max(2k, floor(25500*0.25))) = min(8k, max(2k, 6375)) = 6375.
// 40k chars ≈ 10k tokens. Still can't fit (10k > 6375).
expect(result.tail_start_id).toBeUndefined();
expect(result.head).toEqual([u, a]);
});
@@ -256,3 +268,56 @@ describe('buildPrompt', () => {
expect(out.endsWith('extra-context-line')).toBe(true);
});
});
// ---- buildHeadPayload (v1.13.6) -----------------------------------------------
describe('buildHeadPayload reasoning render', () => {
it('emits reasoning as a <reasoning> tag prefixed onto the assistant content', () => {
const out = buildHeadPayload([
mkMsg('user', 'show me the file'),
mkMsg('assistant', 'reading it now', {
reasoning_parts: [{ text: 'user wants src/index.ts; I should view it' }],
}),
]);
expect(out).toHaveLength(2);
expect(out[1]!.role).toBe('assistant');
expect(out[1]!.content).toBe(
'<reasoning>user wants src/index.ts; I should view it</reasoning>\n\nreading it now',
);
});
it('emits a standalone <reasoning> tag when reasoning is present but content is empty (tool-call-only turn)', () => {
const out = buildHeadPayload([
mkMsg('assistant', '', {
reasoning_parts: [{ text: 'jumping straight to grep' }],
tool_calls: [{ id: 'c1', name: 'grep', args: { pattern: 'foo' } }],
}),
]);
expect(out).toHaveLength(1);
expect(out[0]!.content).toBe('<reasoning>jumping straight to grep</reasoning>');
expect(out[0]!.tool_calls).toHaveLength(1);
expect(out[0]!.tool_calls![0]!.function.name).toBe('grep');
});
it('joins multiple reasoning parts without separators (matches the streaming concat)', () => {
const out = buildHeadPayload([
mkMsg('assistant', 'final answer', {
reasoning_parts: [{ text: 'first thought ' }, { text: 'second thought' }],
}),
]);
expect(out[0]!.content).toBe(
'<reasoning>first thought second thought</reasoning>\n\nfinal answer',
);
});
it('omits the reasoning tag entirely when reasoning_parts is null or empty', () => {
const out = buildHeadPayload([
mkMsg('assistant', 'plain answer', { reasoning_parts: null }),
mkMsg('assistant', 'other answer', { reasoning_parts: [] }),
]);
expect(out[0]!.content).toBe('plain answer');
expect(out[1]!.content).toBe('other answer');
expect(out[0]!.content).not.toContain('<reasoning>');
expect(out[1]!.content).not.toContain('<reasoning>');
});
});

View File

@@ -0,0 +1,199 @@
// v1.13.17-cross-repo-reads: resolveGrantRoot decision tree.
//
// Sam's dispatch note (2026-05-22): "in the project-root resolver ancestor
// walk, stop the moment parent exits PROJECT_ROOT_WHITELIST or hits
// filesystem root — check on every iteration, not just final parent.
// Symlinked input must not be able to escape the whitelist during the
// walk." The symlink-escape-mid-walk test below pins that invariant —
// without the per-iteration whitelist check, this case would walk OUTSIDE
// the whitelist root and return a phantom grant.
import { describe, it, expect, beforeAll, afterAll, vi } from 'vitest';
import { mkdtemp, rm, mkdir, writeFile, symlink } from 'node:fs/promises';
import { tmpdir } from 'node:os';
import { join } from 'node:path';
import { realpath } from 'node:fs/promises';
import { resolveGrantRoot } from '../grant_resolver.js';
import type { Sql } from '../../db.js';
let tmp: string;
let whitelist: string;
let project: string;
let fork: string;
let outside: string;
// Fake sql tag — returns the projects rows we want without touching a real
// database. The resolver only ever does a single SELECT, so a single-shot
// mock that returns the prepared rows on every invocation is enough.
function makeSql(rows: Array<{ path: string }>): Sql {
const tag = ((..._args: unknown[]) => Promise.resolve(rows)) as unknown as Sql;
return tag;
}
beforeAll(async () => {
tmp = await realpath(await mkdtemp(join(tmpdir(), 'boocode-gr-')));
whitelist = join(tmp, 'whitelist');
project = join(whitelist, 'boocode');
fork = join(whitelist, 'forks', 'codecontext');
outside = join(tmp, 'outside');
await mkdir(project, { recursive: true });
await mkdir(fork, { recursive: true });
await mkdir(outside, { recursive: true });
// Mark project as a repo (.git directory).
await mkdir(join(project, '.git'));
await writeFile(join(project, 'README.md'), 'project readme');
// Mark fork as a repo via go.mod (matches the proposal's example).
await writeFile(join(fork, 'go.mod'), 'module example.com/foo');
await writeFile(join(fork, 'main.go'), 'package main');
await writeFile(join(outside, 'secret.txt'), 'forbidden');
});
afterAll(async () => {
await rm(tmp, { recursive: true, force: true });
});
describe('resolveGrantRoot — happy paths', () => {
it('refuses when the requested path is already under projectRoot', async () => {
const result = await resolveGrantRoot(makeSql([]), join(project, 'README.md'), project, whitelist);
expect(result.ok).toBe(false);
if (!result.ok) expect(result.reason).toMatch(/already accessible/);
});
it('returns the project root when the path falls under a registered project', async () => {
// Register `fork` as a known project. Resolver should return the project
// ancestor (LONGEST match wins) rather than the repo-shape fallback.
const result = await resolveGrantRoot(
makeSql([{ path: fork }]),
join(fork, 'main.go'),
project,
whitelist,
);
expect(result.ok).toBe(true);
if (result.ok) {
expect(result.root).toBe(fork);
expect(result.source).toBe('project');
}
});
it('falls back to the nearest repo-shaped ancestor when no project matches', async () => {
const result = await resolveGrantRoot(
makeSql([]),
join(fork, 'main.go'),
project,
whitelist,
);
expect(result.ok).toBe(true);
if (result.ok) {
expect(result.root).toBe(fork);
expect(result.source).toBe('whitelist');
}
});
});
describe('resolveGrantRoot — refusals', () => {
it('refuses paths outside PROJECT_ROOT_WHITELIST', async () => {
const result = await resolveGrantRoot(
makeSql([]),
join(outside, 'secret.txt'),
project,
whitelist,
);
expect(result.ok).toBe(false);
if (!result.ok) expect(result.reason).toMatch(/outside permitted scope/);
});
it('refuses non-absolute paths', async () => {
const result = await resolveGrantRoot(makeSql([]), 'relative/path', project, whitelist);
expect(result.ok).toBe(false);
if (!result.ok) expect(result.reason).toMatch(/absolute/);
});
it('refuses missing paths without prompting', async () => {
const result = await resolveGrantRoot(
makeSql([]),
join(whitelist, 'nope'),
project,
whitelist,
);
expect(result.ok).toBe(false);
if (!result.ok) expect(result.reason).toMatch(/does not exist/);
});
it('refuses when no repo-shape marker is found before hitting the whitelist root', async () => {
// Build a directory tree under the whitelist that has NO repo markers
// all the way up to the whitelist root.
const plain = join(whitelist, 'plain-dir', 'nested');
await mkdir(plain, { recursive: true });
await writeFile(join(plain, 'just-a-file.txt'), 'x');
const result = await resolveGrantRoot(
makeSql([]),
join(plain, 'just-a-file.txt'),
project,
whitelist,
);
expect(result.ok).toBe(false);
if (!result.ok) expect(result.reason).toMatch(/no repo-shaped ancestor/);
});
it('does not grant the whitelist root itself as a fallback', async () => {
// Even if .git existed at the whitelist root (it doesn't), we'd refuse.
// Easier to assert: a path directly under whitelist with no repo marker.
const direct = join(whitelist, 'lone-file.txt');
await writeFile(direct, 'x');
const result = await resolveGrantRoot(makeSql([]), direct, project, whitelist);
expect(result.ok).toBe(false);
});
});
describe('resolveGrantRoot — symlink-escape-mid-walk guard (Sam 2026-05-22)', () => {
it('refuses a symlinked input whose realpath sits outside the whitelist', async () => {
// The symlink lives nominally inside the whitelist, but its target
// (realpath) is outside. The guard's first realpath() call normalizes
// and the up-front whitelist check refuses immediately.
const link = join(whitelist, 'escape-link');
try {
await symlink(outside, link);
const result = await resolveGrantRoot(
makeSql([]),
join(link, 'secret.txt'),
project,
whitelist,
);
expect(result.ok).toBe(false);
if (!result.ok) expect(result.reason).toMatch(/outside permitted scope/);
} finally {
await rm(link, { force: true });
}
});
it('walk loop terminates at the whitelist root, not at filesystem /', async () => {
// Construct a deep tree with NO repo markers anywhere. Without a bound,
// the walk would chase parents up to "/". The bound flips the loop into
// a refusal once the cursor equals the realpath'd whitelist root.
const deep = join(whitelist, 'a', 'b', 'c', 'd');
await mkdir(deep, { recursive: true });
await writeFile(join(deep, 'leaf.txt'), 'x');
const result = await resolveGrantRoot(makeSql([]), join(deep, 'leaf.txt'), project, whitelist);
expect(result.ok).toBe(false);
if (!result.ok) expect(result.reason).toMatch(/no repo-shaped ancestor/);
});
});
describe('resolveGrantRoot — nearest-project disambiguation', () => {
it('prefers the longest matching project path over a shorter ancestor', async () => {
const outer = whitelist;
const inner = fork; // /whitelist/forks/codecontext, deeper than outer
const result = await resolveGrantRoot(
makeSql([{ path: outer }, { path: inner }]),
join(fork, 'main.go'),
project,
whitelist,
);
expect(result.ok).toBe(true);
if (result.ok) expect(result.root).toBe(inner);
});
});
// Belt-and-suspenders: silence a known dynamic-import warning that vitest
// occasionally emits on transient fs operations in CI but never in dev.
vi.spyOn(console, 'warn').mockImplementation(() => {});

View File

@@ -0,0 +1,93 @@
// v1.13.17-cross-repo-reads: pathGuard now accepts an optional extraRoots
// list. Validates the primary-root path stays the source of truth and that
// extra roots are consulted when (and only when) the primary rejects.
import { describe, it, expect, beforeAll, afterAll } from 'vitest';
import { mkdtemp, rm, mkdir, writeFile, symlink } from 'node:fs/promises';
import { tmpdir } from 'node:os';
import { join } from 'node:path';
import { realpath } from 'node:fs/promises';
import { pathGuard, PathScopeError } from '../path_guard.js';
let tmp: string;
let projectRoot: string;
let altRoot: string;
let outsideDir: string;
beforeAll(async () => {
tmp = await realpath(await mkdtemp(join(tmpdir(), 'boocode-pg-')));
projectRoot = join(tmp, 'project');
altRoot = join(tmp, 'alt');
outsideDir = join(tmp, 'outside');
await mkdir(projectRoot, { recursive: true });
await mkdir(altRoot, { recursive: true });
await mkdir(outsideDir, { recursive: true });
await writeFile(join(projectRoot, 'inside.txt'), 'p');
await writeFile(join(altRoot, 'cross.txt'), 'a');
await writeFile(join(outsideDir, 'forbidden.txt'), 'x');
});
afterAll(async () => {
await rm(tmp, { recursive: true, force: true });
});
describe('pathGuard (v1.13.17 extraRoots)', () => {
it('accepts paths inside the primary projectRoot', async () => {
const real = await pathGuard(projectRoot, 'inside.txt');
expect(real).toBe(join(projectRoot, 'inside.txt'));
});
it('rejects paths outside the primary root when no extra roots given', async () => {
await expect(pathGuard(projectRoot, join(outsideDir, 'forbidden.txt'))).rejects.toBeInstanceOf(
PathScopeError,
);
});
it('accepts cross-root paths when the matching extra root is provided', async () => {
const real = await pathGuard(projectRoot, join(altRoot, 'cross.txt'), [altRoot]);
expect(real).toBe(join(altRoot, 'cross.txt'));
});
it('rejects cross-root paths even with extra roots when no root matches', async () => {
await expect(
pathGuard(projectRoot, join(outsideDir, 'forbidden.txt'), [altRoot]),
).rejects.toBeInstanceOf(PathScopeError);
});
it('ignores empty-string extra roots silently', async () => {
const real = await pathGuard(projectRoot, join(altRoot, 'cross.txt'), ['', altRoot]);
expect(real).toBe(join(altRoot, 'cross.txt'));
});
it('error message contains the request_read_access hint when scope rejects', async () => {
try {
await pathGuard(projectRoot, join(outsideDir, 'forbidden.txt'));
throw new Error('should have thrown');
} catch (err) {
expect(err).toBeInstanceOf(PathScopeError);
expect((err as Error).message).toContain('request_read_access');
}
});
it('still resolves symlinks before the scope check', async () => {
const linkPath = join(projectRoot, 'link-to-outside');
await symlink(join(outsideDir, 'forbidden.txt'), linkPath);
// Symlink target escapes both primary and the single extra root, so
// even though the surface path "looks" inside projectRoot, the real
// path resolves outside and the guard rejects.
await expect(pathGuard(projectRoot, linkPath, [altRoot])).rejects.toBeInstanceOf(
PathScopeError,
);
// But adding outsideDir as an extra root accepts (realpath inside it).
const real = await pathGuard(projectRoot, linkPath, [altRoot, outsideDir]);
expect(real).toBe(join(outsideDir, 'forbidden.txt'));
});
it('tries extra roots in order until one accepts', async () => {
const real = await pathGuard(projectRoot, join(altRoot, 'cross.txt'), [
outsideDir, // rejects
altRoot, // accepts
]);
expect(real).toBe(join(altRoot, 'cross.txt'));
});
});

View File

@@ -0,0 +1,96 @@
import { describe, it, expect, beforeEach } from 'vitest';
import {
selectPruneTargets,
PROTECTED_TOKENS,
PRUNE_TRIGGER_TOKENS,
type PartForPrune,
} from '../inference/prune.js';
// Test fixture: build a tool_result part whose payload size yields a known
// token estimate (chars/4). The decision logic only cares about
// JSON.stringify(payload).length, so a string payload of `4n` chars
// produces exactly `n` tokens.
let seq = 0;
function part(tokens: number, createdAt: Date): PartForPrune {
seq += 1;
// JSON.stringify("xxx...") wraps in quotes (adds 2 chars), so subtract 2
// before multiplying. Math.ceil((len+2)/4) needs len ≈ 4*tokens - 2 so the
// total stringified length is 4*tokens. Approximate by padding 4 chars per
// token; the off-by-one from quotes is small and tests check totals, not
// exact per-part counts.
const text = 'x'.repeat(tokens * 4 - 2);
return { id: `p${seq}`, payload: text, created_at: createdAt };
}
const T_NOW = new Date('2026-05-22T12:00:00Z');
function ago(secondsBack: number): Date {
return new Date(T_NOW.getTime() - secondsBack * 1000);
}
describe('selectPruneTargets', () => {
beforeEach(() => {
seq = 0;
});
it('returns nothing when there are no parts', () => {
expect(selectPruneTargets([], null)).toEqual({ ids: [], freedTokens: 0 });
});
it('returns nothing when total tokens are under the protection window', () => {
const parts: PartForPrune[] = [
part(10_000, ago(10)),
part(10_000, ago(20)),
]; // 20k total, all protected
expect(selectPruneTargets(parts, null)).toEqual({ ids: [], freedTokens: 0 });
});
it('returns nothing when candidate total is below the prune trigger', () => {
// Protection fills with ~40k newest, candidates only ~5k. Below 20k trigger.
const parts: PartForPrune[] = [
part(20_000, ago(10)),
part(20_000, ago(20)),
// Past protection; total ~5k won't trigger.
part(5_000, ago(30)),
];
const result = selectPruneTargets(parts, null);
expect(result.ids).toEqual([]);
expect(result.freedTokens).toBe(0);
});
it('hides candidates past protection when their total clears the trigger', () => {
// Newest 40k protected; older 30k cleanly above the 20k trigger.
const parts: PartForPrune[] = [
part(20_000, ago(10)),
part(20_000, ago(20)),
// Past protection, total ~30k freed.
part(15_000, ago(30)),
part(15_000, ago(40)),
];
const result = selectPruneTargets(parts, null);
expect(result.ids).toEqual(['p3', 'p4']);
expect(result.freedTokens).toBeGreaterThanOrEqual(PRUNE_TRIGGER_TOKENS);
});
it('stops at the compaction summary boundary', () => {
// Newest 30k protected (just under PROTECTED_TOKENS=40k); then 30k of
// older parts. Boundary sits at ago(35), so the ago(40) part is
// beyond it and gets skipped.
const parts: PartForPrune[] = [
part(15_000, ago(10)),
part(15_000, ago(20)),
part(15_000, ago(30)), // crosses protection threshold; candidate
part(15_000, ago(40)), // beyond summary boundary; skipped
];
const tailStart = ago(35);
const result = selectPruneTargets(parts, tailStart);
// ago(30) is the only candidate inside the window; 15k is below the
// 20k trigger so we expect no hides.
expect(result.ids).toEqual([]);
});
it('does not prune when only protected parts exist (no candidates)', () => {
// Exactly PROTECTED_TOKENS of newest parts; no older candidates.
const parts: PartForPrune[] = [part(PROTECTED_TOKENS, ago(10))];
expect(selectPruneTargets(parts, null)).toEqual({ ids: [], freedTokens: 0 });
});
});

View File

@@ -6,7 +6,9 @@ import {
loadContainerGuidance,
getContainerGuidance,
buildSystemPrompt,
buildSystemPromptWithFingerprint,
_resetContainerGuidanceCacheForTests,
_resetPrefixObserverForTests,
} from '../system-prompt.js';
import type { Agent, Project, Session } from '../../types/api.js';
@@ -17,12 +19,14 @@ let tmpDir: string;
beforeEach(async () => {
tmpDir = await mkdtemp(join(tmpdir(), 'system-prompt-test-'));
_resetContainerGuidanceCacheForTests();
_resetPrefixObserverForTests();
delete process.env['CONTAINER_GUIDANCE_FILE'];
});
afterEach(async () => {
delete process.env['CONTAINER_GUIDANCE_FILE'];
_resetContainerGuidanceCacheForTests();
_resetPrefixObserverForTests();
await rm(tmpDir, { recursive: true, force: true });
});
@@ -176,3 +180,75 @@ describe('buildSystemPrompt', () => {
expect(prompt).not.toContain('--- end container guidance ---');
});
});
// v1.13.8: byte-stability instrumentation surface.
describe('buildSystemPromptWithFingerprint (v1.13.8)', () => {
it('returns byte-identical prompts for two consecutive calls with the same inputs', async () => {
const path = join(tmpDir, 'BOOCHAT.md');
await writeFile(path, 'stable guidance', 'utf8');
process.env['CONTAINER_GUIDANCE_FILE'] = path;
const session = makeSession();
const project = makeProject({ path: '/tmp/stable-proj' });
const agent = makeAgent({ system_prompt: 'be terse' });
const first = await buildSystemPromptWithFingerprint(project, session, agent);
const second = await buildSystemPromptWithFingerprint(project, session, agent);
expect(first.prompt).toBe(second.prompt);
expect(first.fingerprint.prefix_hash).toBe(second.fingerprint.prefix_hash);
expect(first.fingerprint.prefix_length).toBe(second.fingerprint.prefix_length);
});
it('emits drift=null on the first call for a fresh session, then null again when nothing changes', async () => {
process.env['CONTAINER_GUIDANCE_FILE'] = join(tmpDir, 'absent.md');
const session = makeSession();
const project = makeProject({ path: '/tmp/stable-proj' });
const first = await buildSystemPromptWithFingerprint(project, session, null);
expect(first.drift).toBeNull();
const second = await buildSystemPromptWithFingerprint(project, session, null);
expect(second.drift).toBeNull();
expect(second.fingerprint.prefix_hash).toBe(first.fingerprint.prefix_hash);
});
it('emits drift with prev/new hashes and a changed_inputs entry when an input mutates', async () => {
// Two BOOCHAT.md contents with different mtimes → guidance cache picks
// up the change → fingerprint hash flips → drift fires.
const path = join(tmpDir, 'BOOCHAT.md');
await writeFile(path, 'first', 'utf8');
process.env['CONTAINER_GUIDANCE_FILE'] = path;
const session = makeSession();
const project = makeProject({ path: '/tmp/stable-proj' });
const first = await buildSystemPromptWithFingerprint(project, session, null);
expect(first.drift).toBeNull();
await writeFile(path, 'second — different content', 'utf8');
const later = new Date(Date.now() + 60_000);
await utimes(path, later, later);
const second = await buildSystemPromptWithFingerprint(project, session, null);
expect(second.drift).not.toBeNull();
expect(second.drift!.prev_hash).toBe(first.fingerprint.prefix_hash);
expect(second.drift!.new_hash).toBe(second.fingerprint.prefix_hash);
expect(second.drift!.prev_hash).not.toBe(second.drift!.new_hash);
expect(second.drift!.changed_inputs).toContain('mtime_boochat');
});
it('does not fire drift across distinct sessions even if their hashes differ', async () => {
process.env['CONTAINER_GUIDANCE_FILE'] = join(tmpDir, 'absent.md');
const sessionA = makeSession({ id: 'sess-A' });
const sessionB = makeSession({ id: 'sess-B', system_prompt: 'B-only override' });
const project = makeProject({ path: '/tmp/stable-proj' });
const a = await buildSystemPromptWithFingerprint(project, sessionA, null);
const b = await buildSystemPromptWithFingerprint(project, sessionB, null);
expect(a.drift).toBeNull();
expect(b.drift).toBeNull();
expect(a.fingerprint.prefix_hash).not.toBe(b.fingerprint.prefix_hash);
});
});

View File

@@ -0,0 +1,228 @@
import { describe, it, expect, beforeAll, afterAll } from 'vitest';
import postgres from 'postgres';
import { readFileSync } from 'node:fs';
import { resolve } from 'node:path';
import { fileURLToPath } from 'node:url';
// v1.13.10: integration tests for the tool_cost_stats view. Skipped unless
// DATABASE_URL is set so they don't break `pnpm test` on a fresh checkout.
// Run with:
// DATABASE_URL=postgres://boocode:<pw>@localhost:5500/boocode pnpm -C apps/server test
//
// Isolation: each test uses a unique tool_name suffix derived from a per-test
// counter. The view aggregates globally across all chats, so without unique
// tool names parallel test runs would interfere. Cleanup deletes by tool_name
// suffix in afterAll.
const DB_URL = process.env.DATABASE_URL;
const describeFn = DB_URL ? describe : describe.skip;
const TEST_RUN_ID = `v13_10_${Date.now()}`;
const tname = (suffix: string) => `${TEST_RUN_ID}_${suffix}`;
describeFn('tool_cost_stats view (v1.13.10)', () => {
let sql: ReturnType<typeof postgres>;
let projectId: string;
let sessionId: string;
let chatId: string;
beforeAll(async () => {
if (!DB_URL) return;
sql = postgres(DB_URL, { max: 2, idle_timeout: 5, connect_timeout: 5, onnotice: () => {} });
// Apply the schema before fixtures so the view exists. Idempotent via
// CREATE OR REPLACE VIEW + CREATE TABLE IF NOT EXISTS; safe to run on a
// pre-populated DB. Mirrors apps/server/src/db.ts:applySchema.
const here = fileURLToPath(import.meta.url);
const schemaPath = resolve(here, '../../../schema.sql');
const ddl = readFileSync(schemaPath, 'utf8');
await sql.unsafe(ddl);
// Fixture project + session + chat for all inserts in this file.
const proj = await sql<{ id: string }[]>`
INSERT INTO projects (name, path)
VALUES (${`tool_cost_stats_test_${TEST_RUN_ID}`}, ${`/tmp/${TEST_RUN_ID}`})
RETURNING id
`;
projectId = proj[0]!.id;
const sess = await sql<{ id: string }[]>`
INSERT INTO sessions (project_id, name, model)
VALUES (${projectId}, ${'test'}, ${'test-model'})
RETURNING id
`;
sessionId = sess[0]!.id;
const chat = await sql<{ id: string }[]>`
INSERT INTO chats (session_id, name) VALUES (${sessionId}, ${'test'}) RETURNING id
`;
chatId = chat[0]!.id;
});
afterAll(async () => {
if (!DB_URL) return;
// Project FK CASCADE cleans sessions/chats/messages/parts in one shot.
await sql`DELETE FROM projects WHERE id = ${projectId}`;
await sql.end({ timeout: 5 });
});
async function insertAssistantTurn(opts: {
toolNames: string[];
tokensUsed: number | null;
ctxUsed: number | null;
status?: 'streaming' | 'complete' | 'failed' | 'cancelled';
metadata?: { kind: string } | null;
createdAt?: Date;
}): Promise<string> {
const toolCalls = opts.toolNames.map((name, i) => ({
id: `call_${TEST_RUN_ID}_${name}_${i}`,
name,
args: {},
}));
const created = opts.createdAt ?? new Date();
const rows = await sql<{ id: string }[]>`
INSERT INTO messages (
session_id, chat_id, role, content, kind, status,
tool_calls, tokens_used, ctx_used,
metadata, created_at
)
VALUES (
${sessionId}, ${chatId}, 'assistant', '', 'message',
${opts.status ?? 'complete'},
${sql.json(toolCalls as never)},
${opts.tokensUsed},
${opts.ctxUsed},
${opts.metadata ? sql.json(opts.metadata as never) : null},
${created}
)
RETURNING id
`;
return rows[0]!.id;
}
it('returns empty when no tool calls exist for a tool name', async () => {
const t = tname('absent');
const stats = await sql<{ tool_name: string }[]>`
SELECT * FROM tool_cost_stats WHERE tool_name = ${t}
`;
expect(stats).toEqual([]);
});
it('attributes single-tool turn fully to that tool', async () => {
const t = tname('single');
await insertAssistantTurn({ toolNames: [t], tokensUsed: 300, ctxUsed: 15000 });
const stats = await sql<{
tool_name: string;
prompt_tokens_sum: number;
completion_tokens_sum: number;
n_calls: number;
}[]>`SELECT * FROM tool_cost_stats WHERE tool_name = ${t}`;
expect(stats[0]).toMatchObject({
tool_name: t,
prompt_tokens_sum: 15000,
completion_tokens_sum: 300,
n_calls: 1,
});
});
it('splits multi-tool turn equally across tools', async () => {
const a = tname('multi_a');
const b = tname('multi_b');
const c = tname('multi_c');
// 3 tools, 300 completion / 15000 prompt → each gets 100 / 5000
await insertAssistantTurn({ toolNames: [a, b, c], tokensUsed: 300, ctxUsed: 15000 });
const stats = await sql<{
tool_name: string;
prompt_tokens_sum: number;
completion_tokens_sum: number;
n_calls: number;
}[]>`
SELECT * FROM tool_cost_stats
WHERE tool_name IN (${a}, ${b}, ${c})
ORDER BY tool_name
`;
expect(stats).toHaveLength(3);
for (const s of stats) {
expect(s.completion_tokens_sum).toBe(100);
expect(s.prompt_tokens_sum).toBe(5000);
expect(s.n_calls).toBe(1);
}
});
it('limits to last 100 calls per tool (FIFO window)', async () => {
const t = tname('window');
// Insert 110 turns with monotonically-increasing created_at and tokensUsed.
// Expect view to keep only the most recent 100.
const base = Date.now() + 1_000_000; // distant future to avoid colliding with other tests
for (let i = 1; i <= 110; i++) {
await insertAssistantTurn({
toolNames: [t],
tokensUsed: i, // 1..110
ctxUsed: i * 10,
createdAt: new Date(base + i),
});
}
const [stat] = await sql<{
n_calls: number;
completion_tokens_sum: number;
}[]>`SELECT n_calls, completion_tokens_sum FROM tool_cost_stats WHERE tool_name = ${t}`;
expect(stat!.n_calls).toBe(100);
// Last 100 are tokensUsed=11..110, sum = (11+110)*100/2 = 6050.
expect(stat!.completion_tokens_sum).toBe(6050);
});
it('excludes turns with NULL tokens_used (pre-v1.13.7 latent regression)', async () => {
const t = tname('null_tokens');
await insertAssistantTurn({ toolNames: [t], tokensUsed: null, ctxUsed: 1000 });
await insertAssistantTurn({ toolNames: [t], tokensUsed: 100, ctxUsed: null });
const stats = await sql`SELECT * FROM tool_cost_stats WHERE tool_name = ${t}`;
expect(stats).toEqual([]);
});
it('excludes failed/cancelled turns and cap_hit/doom_loop sentinel rows', async () => {
const t = tname('filtered');
// A: status='failed' — excluded
// B: status='cancelled' — excluded
// C: status='complete', metadata={kind:'cap_hit'} — excluded
// D: status='complete', metadata={kind:'doom_loop'} — excluded
// E: status='complete', metadata=null — included
await insertAssistantTurn({ toolNames: [t], tokensUsed: 100, ctxUsed: 1000, status: 'failed' });
await insertAssistantTurn({ toolNames: [t], tokensUsed: 100, ctxUsed: 1000, status: 'cancelled' });
await insertAssistantTurn({ toolNames: [t], tokensUsed: 100, ctxUsed: 1000, metadata: { kind: 'cap_hit' } });
await insertAssistantTurn({ toolNames: [t], tokensUsed: 100, ctxUsed: 1000, metadata: { kind: 'doom_loop' } });
await insertAssistantTurn({ toolNames: [t], tokensUsed: 100, ctxUsed: 1000, metadata: null });
const [stat] = await sql<{ n_calls: number }[]>`
SELECT n_calls FROM tool_cost_stats WHERE tool_name = ${t}
`;
expect(stat!.n_calls).toBe(1);
});
it('reads tool_calls via messages_with_parts (parts-authoritative)', async () => {
const t = tname('parts');
// Insert an assistant row with messages.tool_calls=NULL but a
// message_parts row carrying the tool_call. The view reads via
// messages_with_parts, which COALESCEs the parts table over the legacy
// column — so this row should still aggregate.
const rows = await sql<{ id: string }[]>`
INSERT INTO messages (
session_id, chat_id, role, content, kind, status,
tool_calls, tokens_used, ctx_used
)
VALUES (
${sessionId}, ${chatId}, 'assistant', '', 'message', 'complete',
NULL, 200, 5000
)
RETURNING id
`;
const messageId = rows[0]!.id;
await sql`
INSERT INTO message_parts (message_id, sequence, kind, payload)
VALUES (
${messageId}, 0, 'tool_call',
${sql.json({ id: `tc_parts_${TEST_RUN_ID}`, name: t, args: {} } as never)}
)
`;
const [stat] = await sql<{ n_calls: number }[]>`
SELECT n_calls FROM tool_cost_stats WHERE tool_name = ${t}
`;
expect(stat!.n_calls).toBe(1);
});
});

View File

@@ -1,5 +1,11 @@
import { describe, it, expect } from 'vitest';
import { ALL_TOOLS } from '../tools.js';
import {
ALL_TOOLS,
CORE_TOOL_NAMES,
STANDARD_TOOL_NAMES,
TOOLS_BY_NAME,
resolveToolTier,
} from '../tools.js';
describe('ALL_TOOLS registry', () => {
// v1.13.3: tools must be alpha-sorted at module load. llama.cpp's prompt
@@ -12,3 +18,59 @@ describe('ALL_TOOLS registry', () => {
expect(names).toEqual([...names].sort((a, b) => a.localeCompare(b)));
});
});
describe('resolveToolTier (v1.13.15-tools)', () => {
it('returns CORE tools for tier=core', () => {
expect(resolveToolTier('core')).toEqual(CORE_TOOL_NAMES);
});
it('returns STANDARD tools for tier=standard', () => {
const result = resolveToolTier('standard');
expect(result.length).toBe(STANDARD_TOOL_NAMES.length);
expect(result.length).toBeGreaterThan(CORE_TOOL_NAMES.length);
// STANDARD is a strict superset of CORE.
expect(result).toEqual(expect.arrayContaining([...CORE_TOOL_NAMES]));
});
it('returns ALL tool names for tier=all', () => {
expect(resolveToolTier('all').length).toBe(ALL_TOOLS.length);
});
it('defaults to all when env var is undefined', () => {
expect(resolveToolTier(undefined).length).toBe(ALL_TOOLS.length);
});
it('is case-insensitive', () => {
expect(resolveToolTier('CORE')).toEqual(CORE_TOOL_NAMES);
expect(resolveToolTier('Standard').length).toBe(STANDARD_TOOL_NAMES.length);
});
it('falls back to all for unknown tier strings', () => {
expect(resolveToolTier('bogus').length).toBe(ALL_TOOLS.length);
});
});
describe('CORE_TOOL_NAMES + STANDARD_TOOL_NAMES validation', () => {
// The module-load validation in tools.ts throws if a tier references a
// tool that doesn't exist in TOOLS_BY_NAME. These tests double-check that
// invariant from the consumer side so a future tier-list edit can't smuggle
// in a typo without a test failure.
it('every CORE name exists in TOOLS_BY_NAME', () => {
for (const name of CORE_TOOL_NAMES) {
expect(TOOLS_BY_NAME[name], `CORE references unknown tool '${name}'`).toBeDefined();
}
});
it('every STANDARD name exists in TOOLS_BY_NAME', () => {
for (const name of STANDARD_TOOL_NAMES) {
expect(TOOLS_BY_NAME[name], `STANDARD references unknown tool '${name}'`).toBeDefined();
}
});
it('CORE is a subset of STANDARD', () => {
const standardSet = new Set<string>(STANDARD_TOOL_NAMES);
for (const name of CORE_TOOL_NAMES) {
expect(standardSet.has(name), `'${name}' is in CORE but not STANDARD`).toBe(true);
}
});
});

View File

@@ -0,0 +1,104 @@
// v1.13.5: truncate.ts unit coverage. Each test isolates TRUNCATION_DIR
// under os.tmpdir() so concurrent vitest runs don't collide and the suite
// stays self-cleaning. cleanupTruncations is covered by file-system half
// only; the orphan-reap branch needs a real Postgres and is tested via the
// smoke flow rather than vitest.
import { afterEach, beforeAll, describe, expect, it, vi } from 'vitest';
import { promises as fs } from 'fs';
import path from 'path';
import os from 'os';
// Set the env var BEFORE importing the module so its module-load constant
// reads the test directory rather than /tmp/boocode-truncations.
const testDir = path.join(os.tmpdir(), `boocode-truncate-test-${process.pid}-${Date.now()}`);
process.env.BOOCODE_TRUNCATION_DIR = testDir;
const mod = await import('../truncate.js');
const { storeTruncation, readTruncation, truncateIfNeeded, MAX_TRUNCATION_BYTES } = mod;
beforeAll(async () => {
await fs.mkdir(testDir, { recursive: true });
});
afterEach(async () => {
// Drop every file between tests so id-collision asserts and orphan-style
// counts start from zero.
const entries = await fs.readdir(testDir).catch(() => [] as string[]);
await Promise.all(entries.map((n) => fs.unlink(path.join(testDir, n)).catch(() => {})));
});
describe('storeTruncation / readTruncation roundtrip', () => {
it('writes and reads identical content', async () => {
const original = 'hello\nworld\n' + 'x'.repeat(500);
const id = await storeTruncation(original);
expect(id).toMatch(/^tr_[0-9a-v]{12}$/);
const got = await readTruncation(id);
expect(got).toBe(original);
});
it('readTruncation returns null for unknown ids', async () => {
const got = await readTruncation('tr_000000000000');
expect(got).toBeNull();
});
it('readTruncation rejects malformed ids (returns null, never escapes dir)', async () => {
// Path traversal attempt; readTruncation should not even try to open.
const got = await readTruncation('../../etc/passwd');
expect(got).toBeNull();
});
});
describe('truncateIfNeeded', () => {
it('returns sliced content with no outputPath when wasTruncated=false', async () => {
const out = await truncateIfNeeded({
fullContent: 'irrelevant',
slicedContent: 'visible',
wasTruncated: false,
});
expect(out).toEqual({ content: 'visible', truncated: false });
expect('outputPath' in out).toBe(false);
});
it('stashes full content and returns outputPath when wasTruncated=true', async () => {
const full = 'line1\nline2\nline3\nline4\n';
const sliced = 'line1\nline2\n[truncated]';
const out = await truncateIfNeeded({
fullContent: full,
slicedContent: sliced,
wasTruncated: true,
});
expect(out.content).toBe(sliced);
expect(out.truncated).toBe(true);
expect(out.outputPath).toMatch(/^tr_[0-9a-v]{12}$/);
const stashed = await readTruncation(out.outputPath!);
expect(stashed).toBe(full);
});
it('skips storage but still reports truncated when fullContent exceeds the cap', async () => {
// Build content larger than MAX_TRUNCATION_BYTES. Use a Buffer to size
// it without holding a literal that triggers the gigantic-string lint.
const oversized = Buffer.alloc(MAX_TRUNCATION_BYTES + 1, 'x').toString('utf8');
const sliced = 'preview...';
const out = await truncateIfNeeded({
fullContent: oversized,
slicedContent: sliced,
wasTruncated: true,
});
expect(out).toEqual({ content: sliced, truncated: true });
expect('outputPath' in out).toBe(false);
});
it('storage failure surfaces as truncated without outputPath', async () => {
// Force writeFile to throw. Spy at the fs module level since truncate.ts
// imports { promises as fs } and storeTruncation calls fs.writeFile.
const spy = vi.spyOn(fs, 'writeFile').mockRejectedValueOnce(new Error('disk full'));
const out = await truncateIfNeeded({
fullContent: 'short',
slicedContent: 'sliced',
wasTruncated: true,
});
expect(out).toEqual({ content: 'sliced', truncated: true });
expect('outputPath' in out).toBe(false);
spy.mockRestore();
});
});

View File

@@ -0,0 +1,218 @@
import { describe, it, expect, vi, beforeEach, afterEach } from 'vitest';
import { readFileSync } from 'node:fs';
import { resolve } from 'node:path';
import { fileURLToPath } from 'node:url';
import {
WsFrameSchema,
KNOWN_FRAME_TYPES,
type WsFrame,
} from '../../types/ws-frames.js';
import { createBroker } from '../broker.js';
const VALID_UUID_A = '00000000-0000-0000-0000-000000000001';
const VALID_UUID_B = '00000000-0000-0000-0000-000000000002';
const VALID_UUID_C = '00000000-0000-0000-0000-000000000003';
const VALID_TIMESTAMP = '2026-05-22T14:30:00.000Z';
describe('WsFrameSchema (v1.13.11-a)', () => {
it('accepts a well-formed chat_status frame', () => {
const result = WsFrameSchema.safeParse({
type: 'chat_status',
chat_id: VALID_UUID_A,
status: 'streaming',
at: VALID_TIMESTAMP,
});
expect(result.success).toBe(true);
});
it('rejects an unknown frame type', () => {
const result = WsFrameSchema.safeParse({
type: 'cosmic_ray_strike',
chat_id: VALID_UUID_A,
});
expect(result.success).toBe(false);
});
it('rejects a chat_status frame with invalid status enum', () => {
// v1.12.1 dropped the legacy 'working' status. Any frame still emitting it
// should fail validation — that's a drift catcher.
const result = WsFrameSchema.safeParse({
type: 'chat_status',
chat_id: VALID_UUID_A,
status: 'working',
at: VALID_TIMESTAMP,
});
expect(result.success).toBe(false);
});
it('rejects a UUID field with a non-UUID string', () => {
const result = WsFrameSchema.safeParse({
type: 'chat_status',
chat_id: 'not-a-uuid',
status: 'idle',
at: VALID_TIMESTAMP,
});
expect(result.success).toBe(false);
});
it('rejects negative token counts in usage frame', () => {
const result = WsFrameSchema.safeParse({
type: 'usage',
message_id: VALID_UUID_A,
chat_id: VALID_UUID_B,
completion_tokens: -1,
ctx_used: 100,
ctx_max: 1000,
});
expect(result.success).toBe(false);
});
it('accepts a usage frame with nullable token counts (pre-v1.13.7 history)', () => {
const result = WsFrameSchema.safeParse({
type: 'usage',
message_id: VALID_UUID_A,
chat_id: VALID_UUID_B,
completion_tokens: null,
ctx_used: null,
ctx_max: null,
});
expect(result.success).toBe(true);
});
it('accepts a tool_result frame with non-UUID tool_call_id (model-emitted)', () => {
// Model-emitted tool_call_ids look like "call_abc123", not UUIDs.
const result = WsFrameSchema.safeParse({
type: 'tool_result',
tool_message_id: VALID_UUID_A,
chat_id: VALID_UUID_B,
tool_call_id: 'call_abc123',
output: { whatever: true },
truncated: false,
});
expect(result.success).toBe(true);
});
it('accepts a compacted frame', () => {
const result = WsFrameSchema.safeParse({
type: 'compacted',
session_id: VALID_UUID_A,
chat_id: VALID_UUID_B,
summary_message_id: VALID_UUID_C,
});
expect(result.success).toBe(true);
});
it('accepts a session_workspace_updated frame', () => {
const result = WsFrameSchema.safeParse({
type: 'session_workspace_updated',
session_id: VALID_UUID_A,
workspace_panes: [{ id: 'p1', kind: 'chat', chatIds: [], activeChatIdx: 0 }],
});
expect(result.success).toBe(true);
});
it('every KNOWN_FRAME_TYPES entry has a discriminated branch', () => {
// Probe each known type by attempting a minimal valid construction.
// Failure here means the union and the KNOWN_FRAME_TYPES list drifted.
for (const type of KNOWN_FRAME_TYPES) {
const probe = WsFrameSchema.safeParse({ type, __dummy__: true });
// We expect FAILURE on every type because we're missing required fields,
// but the failure must be ABOUT the missing fields, not about an unknown
// type. A "Invalid discriminator value" error means the type isn't in
// the union — that's a drift.
if (probe.success) continue;
const issues = probe.error.issues;
const hasInvalidDiscriminator = issues.some(
(i) => i.code === 'invalid_union_discriminator',
);
expect(hasInvalidDiscriminator, `frame type '${type}' is missing from the discriminated union`).toBe(false);
}
});
});
describe('ws-frames.ts file mirror parity', () => {
it('apps/server and apps/web copies are byte-identical', () => {
const here = fileURLToPath(import.meta.url);
const serverPath = resolve(here, '../../../types/ws-frames.ts');
const webPath = resolve(here, '../../../../../web/src/api/ws-frames.ts');
const serverContent = readFileSync(serverPath, 'utf8');
const webContent = readFileSync(webPath, 'utf8');
expect(webContent, 'apps/web/src/api/ws-frames.ts must be byte-identical to apps/server/src/types/ws-frames.ts').toBe(serverContent);
});
});
describe('broker.publishFrame / publishUserFrame fail-closed behavior', () => {
let logErrors: Array<{ obj: unknown; msg: string }>;
let mockLog: Parameters<typeof createBroker>[0];
beforeEach(() => {
logErrors = [];
mockLog = {
error: (obj: unknown, msg: string) => {
logErrors.push({ obj, msg });
},
info: () => {},
warn: () => {},
debug: () => {},
trace: () => {},
fatal: () => {},
child: () => mockLog as never,
level: 'info',
silent: () => {},
} as unknown as Parameters<typeof createBroker>[0];
});
afterEach(() => {
vi.restoreAllMocks();
});
it('publishFrame delivers a valid frame to subscribers', () => {
const broker = createBroker(mockLog);
const received: WsFrame[] = [];
broker.subscribe('sess-1', (f) => received.push(f as WsFrame));
broker.publishFrame('sess-1', {
type: 'delta',
message_id: VALID_UUID_A,
chat_id: VALID_UUID_B,
content: 'hello',
});
expect(received).toHaveLength(1);
expect((received[0] as { type: string }).type).toBe('delta');
expect(logErrors).toHaveLength(0);
});
it('publishFrame drops + logs an invalid frame instead of delivering it', () => {
const broker = createBroker(mockLog);
const received: WsFrame[] = [];
broker.subscribe('sess-1', (f) => received.push(f as WsFrame));
broker.publishFrame('sess-1', {
type: 'delta',
message_id: 'not-a-uuid',
content: 'hello',
} as never);
expect(received).toHaveLength(0);
expect(logErrors).toHaveLength(1);
expect(logErrors[0]!.msg).toMatch(/ws-frame-validation-failed/);
});
it('publishUserFrame drops + logs an invalid user-channel frame', () => {
const broker = createBroker(mockLog);
const received: WsFrame[] = [];
broker.subscribeUser('default', (f) => received.push(f as WsFrame));
broker.publishUserFrame('default', {
type: 'chat_status',
chat_id: VALID_UUID_A,
status: 'working', // v1.12.1 dropped this enum value
at: VALID_TIMESTAMP,
} as never);
expect(received).toHaveLength(0);
expect(logErrors).toHaveLength(1);
});
it('publishFrame validation failure does not throw (no cascade into stream-phase)', () => {
const broker = createBroker(mockLog);
expect(() =>
broker.publishFrame('sess-1', { type: 'unknown_type' } as never),
).not.toThrow();
});
});

View File

@@ -0,0 +1,357 @@
// v1.13.16: covers the Qwen/Hermes <tool_call> parser, the new Anthropic
// <invoke> parser, the partial-opener detector for both flavors, the unified
// extraction helper, and the unknown-tool error formatter that downstream
// dispatch uses to give the model a recovery hint when it drifts to a
// Claude Code tool name like read_file instead of BooCode's view_file.
import { describe, expect, it } from 'vitest';
import {
parseXmlToolCall,
parseInvokeToolCall,
partialXmlOpenerStart,
extractToolCallBlocks,
XML_TOOL_OPEN,
XML_TOOL_CLOSE,
INVOKE_TOOL_OPEN,
INVOKE_TOOL_CLOSE,
} from '../inference/xml-parser.js';
import {
levenshtein,
suggestToolName,
formatUnknownToolError,
} from '../inference/tool-suggestions.js';
describe('parseXmlToolCall (Qwen/Hermes <tool_call>)', () => {
it('parses a well-formed single-parameter call', () => {
const block = '<tool_call><function=view_file><parameter=path>/tmp/foo</parameter></function></tool_call>';
expect(parseXmlToolCall(block)).toEqual({
name: 'view_file',
args: { path: '/tmp/foo' },
});
});
it('parses multi-parameter call', () => {
const block = '<tool_call><function=grep><parameter=pattern>foo</parameter><parameter=path>src/</parameter></function></tool_call>';
expect(parseXmlToolCall(block)).toEqual({
name: 'grep',
args: { pattern: 'foo', path: 'src/' },
});
});
it('JSON-parses numeric parameter values', () => {
const block = '<tool_call><function=foo><parameter=count>42</parameter></function></tool_call>';
expect(parseXmlToolCall(block)).toEqual({ name: 'foo', args: { count: 42 } });
});
it('tolerates whitespace around = in function (v1.13.16 tightening)', () => {
const block = '<tool_call><function = view_file><parameter=path>/tmp/foo</parameter></function></tool_call>';
expect(parseXmlToolCall(block)).toEqual({
name: 'view_file',
args: { path: '/tmp/foo' },
});
});
it('tolerates whitespace around = in parameter (v1.13.16 tightening)', () => {
const block = '<tool_call><function=view_file><parameter = path>/tmp/foo</parameter></function></tool_call>';
expect(parseXmlToolCall(block)).toEqual({
name: 'view_file',
args: { path: '/tmp/foo' },
});
});
it('returns null when function name is missing', () => {
const block = '<tool_call><parameter=path>/tmp/foo</parameter></tool_call>';
expect(parseXmlToolCall(block)).toBeNull();
});
});
describe('parseInvokeToolCall (Anthropic <invoke>) — v1.13.16', () => {
// Spec case 1
it('parses a well-formed single-parameter call (spec case 1)', () => {
const block = '<invoke name="view_file"><parameter name="path">/tmp/foo</parameter></invoke>';
expect(parseInvokeToolCall(block)).toEqual({
name: 'view_file',
args: { path: '/tmp/foo' },
});
});
// Spec case 2
it('parses a multi-parameter call (spec case 2)', () => {
const block = '<invoke name="grep"><parameter name="pattern">foo</parameter><parameter name="path">src/</parameter></invoke>';
expect(parseInvokeToolCall(block)).toEqual({
name: 'grep',
args: { pattern: 'foo', path: 'src/' },
});
});
// Spec case 3
it('tolerates newlines and spaces in attributes (spec case 3)', () => {
const block = `<invoke
name="view_file"
>
<parameter
name="path"
>/tmp/foo</parameter>
</invoke>`;
expect(parseInvokeToolCall(block)).toEqual({
name: 'view_file',
args: { path: '/tmp/foo' },
});
});
// Spec case 4 (parser portion — the not-found enrichment is tested below)
it('parses a call whose name is not a registered BooCode tool (spec case 4)', () => {
const block = '<invoke name="read_file"><parameter name="path">/tmp/foo</parameter></invoke>';
expect(parseInvokeToolCall(block)).toEqual({
name: 'read_file',
args: { path: '/tmp/foo' },
});
});
it('supports single-quoted attribute values', () => {
const block = "<invoke name='view_file'><parameter name='path'>/tmp/foo</parameter></invoke>";
expect(parseInvokeToolCall(block)).toEqual({
name: 'view_file',
args: { path: '/tmp/foo' },
});
});
it('JSON-parses numeric parameter values', () => {
const block = '<invoke name="foo"><parameter name="count">42</parameter></invoke>';
expect(parseInvokeToolCall(block)).toEqual({ name: 'foo', args: { count: 42 } });
});
it('tolerates spaces around = inside name attribute', () => {
const block = '<invoke name = "view_file"><parameter name = "path">/tmp/foo</parameter></invoke>';
expect(parseInvokeToolCall(block)).toEqual({
name: 'view_file',
args: { path: '/tmp/foo' },
});
});
it('returns null when name attribute is missing', () => {
const block = '<invoke><parameter name="path">/tmp/foo</parameter></invoke>';
expect(parseInvokeToolCall(block)).toBeNull();
});
it('returns null when name attribute is empty', () => {
const block = '<invoke name=""><parameter name="path">/tmp/foo</parameter></invoke>';
expect(parseInvokeToolCall(block)).toBeNull();
});
it('exports the expected delimiters', () => {
expect(INVOKE_TOOL_OPEN).toBe('<invoke');
expect(INVOKE_TOOL_CLOSE).toBe('</invoke>');
expect(XML_TOOL_OPEN).toBe('<tool_call>');
expect(XML_TOOL_CLOSE).toBe('</tool_call>');
});
});
describe('partialXmlOpenerStart (v1.13.16 — both flavors)', () => {
it('returns -1 when the buffer is empty', () => {
expect(partialXmlOpenerStart('')).toBe(-1);
});
it('returns -1 when the buffer has no openers', () => {
expect(partialXmlOpenerStart('plain prose, no markup')).toBe(-1);
});
it('returns the index of a complete <tool_call> opener (existing)', () => {
expect(partialXmlOpenerStart('prose <tool_call>more')).toBe(6);
});
it('returns the index of a complete <invoke opener (v1.13.16)', () => {
expect(partialXmlOpenerStart('prose <invoke name=')).toBe(6);
});
it('holds a partial <tool_ prefix at end of buffer', () => {
expect(partialXmlOpenerStart('text <tool_')).toBe(5);
});
it('holds a partial <invo prefix at end of buffer (v1.13.16)', () => {
expect(partialXmlOpenerStart('text <invo')).toBe(5);
});
it('holds a bare < at end of buffer', () => {
expect(partialXmlOpenerStart('text <')).toBe(5);
});
it('returns -1 when < is followed by non-opener text', () => {
expect(partialXmlOpenerStart('text <unknown>')).toBe(-1);
});
it('returns the earliest opener when both flavors are present', () => {
expect(partialXmlOpenerStart('xxx <tool_call>YYY <invoke>')).toBe(4);
expect(partialXmlOpenerStart('xxx <invoke>YYY <tool_call>')).toBe(4);
});
});
describe('extractToolCallBlocks (v1.13.16 — unified extraction)', () => {
// Spec case 1 (extraction-level)
it('extracts a single <invoke> block (spec case 1)', () => {
const input = '<invoke name="view_file"><parameter name="path">/tmp/foo</parameter></invoke>';
const result = extractToolCallBlocks(input);
expect(result.calls).toEqual([{ name: 'view_file', args: { path: '/tmp/foo' } }]);
expect(result.flushed).toBe('');
expect(result.remaining).toBe('');
});
// Spec case 5: opener arrives in one chunk, closer in the next.
it('holds the partial <invoke> chunk when the closer has not arrived (spec case 5, first chunk)', () => {
const firstChunk = '<invoke name="view_file"><parameter name="path">/tmp/foo</parameter>';
const result = extractToolCallBlocks(firstChunk);
expect(result.calls).toEqual([]);
expect(result.flushed).toBe('');
expect(result.remaining).toBe(firstChunk);
});
it('extracts the block once the closer arrives in a later chunk (spec case 5, completion)', () => {
const firstChunk = '<invoke name="view_file"><parameter name="path">/tmp/foo</parameter>';
const r1 = extractToolCallBlocks(firstChunk);
const combined = r1.remaining + '</invoke>';
const r2 = extractToolCallBlocks(combined);
expect(r2.calls).toEqual([{ name: 'view_file', args: { path: '/tmp/foo' } }]);
expect(r2.flushed).toBe('');
expect(r2.remaining).toBe('');
});
// Spec case 6: prose interleaving
it('flushes prose around a recognized block but not the markup itself (spec case 6)', () => {
const input = 'I will read the file.\n<invoke name="view_file"><parameter name="path">/tmp/foo</parameter></invoke>\nThanks.';
const result = extractToolCallBlocks(input);
expect(result.calls).toEqual([{ name: 'view_file', args: { path: '/tmp/foo' } }]);
expect(result.flushed).toBe('I will read the file.\n\nThanks.');
expect(result.remaining).toBe('');
});
// Spec case 7 regression
it('extracts a <tool_call> Qwen block alongside the new code path (spec case 7 regression)', () => {
const input = '<tool_call><function=view_file><parameter=path>/tmp/foo</parameter></function></tool_call>';
const result = extractToolCallBlocks(input);
expect(result.calls).toEqual([{ name: 'view_file', args: { path: '/tmp/foo' } }]);
expect(result.flushed).toBe('');
expect(result.remaining).toBe('');
});
it('extracts mixed-format blocks in source order (hand-back: shared counter)', () => {
const input =
'<invoke name="view_file"><parameter name="path">/a</parameter></invoke>' +
' middle ' +
'<tool_call><function=grep><parameter=pattern>foo</parameter></function></tool_call>';
const result = extractToolCallBlocks(input);
expect(result.calls).toEqual([
{ name: 'view_file', args: { path: '/a' } },
{ name: 'grep', args: { pattern: 'foo' } },
]);
expect(result.flushed).toBe(' middle ');
expect(result.remaining).toBe('');
});
it('drops a malformed <invoke> block silently (matches existing <tool_call> behavior)', () => {
const input = 'prose <invoke><parameter name="path">/a</parameter></invoke> trailing';
const result = extractToolCallBlocks(input);
expect(result.calls).toEqual([]);
expect(result.flushed).toBe('prose trailing');
expect(result.remaining).toBe('');
});
it('holds a tail with a fresh partial opener after extracting earlier complete blocks', () => {
const input = '<invoke name="view_file"><parameter name="path">/a</parameter></invoke> next: <tool_';
const result = extractToolCallBlocks(input);
expect(result.calls).toEqual([{ name: 'view_file', args: { path: '/a' } }]);
expect(result.flushed).toBe(' next: ');
expect(result.remaining).toBe('<tool_');
});
it('passes plain prose straight through when no markup is present', () => {
const input = 'just some text with a < character but no opener';
const result = extractToolCallBlocks(input);
expect(result.calls).toEqual([]);
expect(result.flushed).toBe(input);
expect(result.remaining).toBe('');
});
});
describe('levenshtein', () => {
it('returns 0 for identical strings', () => {
expect(levenshtein('view_file', 'view_file')).toBe(0);
});
it('returns the length when one string is empty', () => {
expect(levenshtein('', 'view_file')).toBe(9);
expect(levenshtein('view_file', '')).toBe(9);
});
it('computes a small distance for a single-character substitution', () => {
expect(levenshtein('cat', 'bat')).toBe(1);
});
it('computes a known case: read_file → view_file is 4', () => {
// r→v, e→i, a→e, d→w → 4 substitutions, same length
expect(levenshtein('read_file', 'view_file')).toBe(4);
});
});
describe('suggestToolName (v1.13.16)', () => {
const tools = [
'view_file',
'list_dir',
'grep',
'find_files',
'view_truncated_output',
'ask_user_input',
'web_search',
];
it('suggests the closest match when distance is small', () => {
expect(suggestToolName('view_files', tools)).toBe('view_file');
});
it('suggests via substring match when distance alone would miss', () => {
// 'file' is a substring of multiple tools; closest by distance wins.
expect(suggestToolName('file', tools)).toBe('view_file');
});
it('returns null when nothing is close', () => {
expect(suggestToolName('xxxx_yyyy_zzzz', tools)).toBeNull();
});
it('is case-insensitive in the distance check', () => {
expect(suggestToolName('VIEW_FILE', tools)).toBe('view_file');
});
});
describe('formatUnknownToolError (v1.13.16)', () => {
const tools = ['view_file', 'list_dir', 'grep', 'find_files'];
it('includes the wrong name and the available tools list', () => {
const msg = formatUnknownToolError('read_file', tools);
expect(msg).toContain("Tool 'read_file' not found");
expect(msg).toContain('Available tools:');
expect(msg).toContain('view_file');
expect(msg).toContain('find_files');
});
it('includes a suggestion when the drifted name is within threshold', () => {
// distance(view_files, view_file) = 1 (one extra char)
const msg = formatUnknownToolError('view_files', tools);
expect(msg).toContain('Did you mean: view_file?');
});
it('omits the suggestion clause when no tool is close enough', () => {
const msg = formatUnknownToolError('zzzzzzz', tools);
expect(msg).toContain("Tool 'zzzzzzz' not found");
expect(msg).toContain('Available tools:');
expect(msg).not.toContain('Did you mean');
});
// The drift incident in the recon (chat 30d8…1be7167, msg 7ff558f4) had the
// model emit <invoke name="read_file">. lev(read_file, view_file) = 4, so
// the spec's threshold (<=3) doesn't suggest view_file — the model still
// gets the available-tools list to pick from. This pins that behavior so a
// future loosening of the threshold is a deliberate choice.
it('does not suggest view_file for the read_file drift case (distance is 4, over threshold)', () => {
const msg = formatUnknownToolError('read_file', tools);
expect(msg).not.toContain('Did you mean');
});
});

View File

@@ -1,7 +1,7 @@
import { promises as fs } from 'node:fs';
import { join } from 'node:path';
import type { Agent, AgentsResponse, AgentParseError } from '../types/api.js';
import { ALL_TOOLS } from './tools.js';
import { ALL_TOOLS, resolveToolTier } from './tools.js';
// v1.8.1: global agents live at /data/AGENTS.md inside the container
// (./data:/data:ro mount on the host). Per-project AGENTS.md at the project
@@ -186,11 +186,14 @@ function parseAgentSection(section: RawSection): Omit<Agent, 'source'> {
throw new Error(fmErrors.join('; '));
}
// v1.13.15-tools: intersect with BOOCODE_TOOLS tier (ceiling, not expansion).
// Unset → resolveToolTier returns ALL tool names → no narrowing.
const tierAllowed = new Set(resolveToolTier(process.env.BOOCODE_TOOLS));
const filteredTools = Array.isArray(fm.tools)
? fm.tools.filter((t): t is string =>
(ALL_TOOL_NAMES as readonly string[]).includes(t),
(ALL_TOOL_NAMES as readonly string[]).includes(t) && tierAllowed.has(t),
)
: DEFAULT_TOOLS;
: DEFAULT_TOOLS.filter((t) => tierAllowed.has(t));
return {
id: slugify(section.name),
@@ -252,6 +255,22 @@ export function invalidateAgentsCache(projectPath?: string): void {
}
}
// v1.13.8: cache-read accessor for the system-prompt prefix-fingerprint log.
// Returns the AGENTS.md mtimes that getAgentsForProject() observed on its
// last cache fill for this projectPath. Both fields are null when the cache
// is cold (e.g. tests, fresh boot before the first inference turn). Does no
// I/O — a fresh stat would race the cache and isn't what the fingerprint
// wants anyway (we want what was actually used to resolve the agent).
export function getAgentsMtimes(projectPath: string): {
global: number | null;
project: number | null;
} {
const key = projectPath || '__none__';
const entry = cache.get(key);
if (!entry) return { global: null, project: null };
return { global: entry.globalMtime, project: entry.projectMtime };
}
async function safeStat(path: string): Promise<number | null> {
try {
const s = await fs.stat(path);

View File

@@ -0,0 +1,255 @@
// v1.14.x-html-artifact-panes: artifact writer + slug derivation.
//
// Writes Markdown and HTML artifacts to `<projectRoot>/.boocode/artifacts/`
// as plain files. Returns `{path, url}` where:
// - path is the absolute on-disk path
// - url is a project-scoped REST URL pointing at the GET download route
// registered in routes/artifacts.ts. The route streams the file with
// Content-Disposition: attachment.
//
// Path safety: we do NOT use path_guard.ts (it realpaths and throws ENOENT
// for files that don't exist yet, which artifact creation requires).
// Instead we mirror the v1.13.18 codecontext_client.ts pattern: resolve
// the candidate path against the realpath'd projectRoot, then verify the
// result starts with projectRoot + sep (or equals projectRoot).
import { mkdir, realpath, writeFile } from 'node:fs/promises';
import { resolve, sep } from 'node:path';
import { PathScopeError } from './path_guard.js';
import type { Message } from '../types/api.js';
export interface HtmlArtifactPayload {
html_content: string;
char_count: number;
title: string | null;
}
export interface ArtifactWriteResult {
path: string;
url: string;
}
const ARTIFACT_SUBDIR = '.boocode/artifacts';
// ---- slug helpers ----
// Lowercase, replace non-alnum runs with '-', trim leading/trailing '-',
// collapse repeated '-', cap at 60 chars. Empty → 'artifact'.
function slugify(input: string): string {
const cleaned = input
.toLowerCase()
.replace(/[^a-z0-9]+/g, '-')
.replace(/^-+|-+$/g, '')
.replace(/-{2,}/g, '-')
.slice(0, 60)
.replace(/^-+|-+$/g, '');
return cleaned || 'artifact';
}
function firstHeading(md: string): string | null {
// Match the first `# ` ATX heading at the start of a line.
const m = md.match(/^[ \t]*#[ \t]+(.+?)\s*$/m);
if (!m) return null;
const text = m[1]?.trim() ?? '';
return text.length > 0 ? text : null;
}
function firstNWords(s: string, n: number): string {
const words = s.trim().split(/\s+/).filter(Boolean).slice(0, n);
return words.join(' ');
}
export function deriveMarkdownSlug(messageContent: string): string {
const heading = firstHeading(messageContent);
if (heading) return slugify(heading);
const sixWords = firstNWords(messageContent, 6);
return slugify(sixWords);
}
// Strip HTML tags for inner-text extraction. Crude but sufficient for slug
// derivation — we're not rendering, just finding readable words.
function stripTags(html: string): string {
return html
.replace(/<script\b[^<]*(?:(?!<\/script>)<[^<]*)*<\/script>/gi, ' ')
.replace(/<style\b[^<]*(?:(?!<\/style>)<[^<]*)*<\/style>/gi, ' ')
.replace(/<[^>]+>/g, ' ')
.replace(/\s+/g, ' ')
.trim();
}
function extractTitleTag(html: string): string | null {
const m = html.match(/<title[^>]*>([\s\S]*?)<\/title>/i);
if (!m) return null;
const text = stripTags(m[1] ?? '').trim();
return text.length > 0 ? text : null;
}
function extractH1(html: string): string | null {
const m = html.match(/<h1[^>]*>([\s\S]*?)<\/h1>/i);
if (!m) return null;
const text = stripTags(m[1] ?? '').trim();
return text.length > 0 ? text : null;
}
export function deriveHtmlSlug(payload: {
html_content: string;
title: string | null;
}): string {
if (payload.title && payload.title.trim().length > 0) {
return slugify(payload.title);
}
const title = extractTitleTag(payload.html_content);
if (title) return slugify(title);
const h1 = extractH1(payload.html_content);
if (h1) return slugify(h1);
const inner = stripTags(payload.html_content);
return slugify(firstNWords(inner, 6));
}
// Derive title for the html_artifact part payload: <title> → first <h1> →
// first 80 chars of inner text. Returns null if nothing useful is found.
export function deriveHtmlTitle(html: string): string | null {
const t = extractTitleTag(html);
if (t) return t;
const h1 = extractH1(html);
if (h1) return h1;
const inner = stripTags(html);
if (inner.length === 0) return null;
return inner.slice(0, 80);
}
// ---- HTML detection (B4) ----
// Returns the inner HTML content if `text` is a recognised HTML artifact:
// - starts with <!DOCTYPE html> (case-insensitive, whitespace-trimmed), OR
// - wrapped entirely in a fenced ```html ... ``` block.
// Returns null if neither matches.
export function detectHtmlArtifact(text: string): string | null {
const trimmed = text.trim();
if (trimmed.length === 0) return null;
if (/^<!doctype\s+html/i.test(trimmed)) {
return trimmed;
}
// Fenced ```html block consuming the entire (trimmed) message. Allow an
// optional trailing newline before the closing fence.
const fence = trimmed.match(/^```html\s*\n([\s\S]*?)\n?```\s*$/i);
if (fence) {
const inner = fence[1] ?? '';
if (/^\s*<!doctype\s+html/i.test(inner) || /<html[\s>]/i.test(inner)) {
return inner.trim();
}
}
return null;
}
// ---- path resolution ----
// Resolve `<projectRoot>/.boocode/artifacts/<filename>` and verify the
// result stays under projectRoot. Mirrors the v1.13.18 codecontext_client.ts
// approach: realpath projectRoot first, then prefix-check the candidate.
// Throws on escape.
async function resolveArtifactPath(
projectRoot: string,
filename: string,
): Promise<{ resolvedRoot: string; artifactsDir: string; absPath: string }> {
const resolvedRoot = await realpath(projectRoot);
const artifactsDir = resolve(resolvedRoot, ARTIFACT_SUBDIR);
const absPath = resolve(artifactsDir, filename);
// Lexical prefix check on the resolved candidates. (The `!== resolvedRoot`
// branch was dead — ARTIFACT_SUBDIR is non-empty so artifactsDir always
// differs from resolvedRoot.)
if (!artifactsDir.startsWith(resolvedRoot + sep)) {
throw new PathScopeError(
`artifacts dir escapes project root: ${artifactsDir}`,
);
}
if (!absPath.startsWith(artifactsDir + sep)) {
throw new PathScopeError(
`artifact filename escapes artifacts dir: ${filename}`,
);
}
return { resolvedRoot, artifactsDir, absPath };
}
// After mkdir, realpath the artifacts dir and re-verify it stays under
// resolvedRoot. Closes the symlink-escape gap: if `.boocode/artifacts` (or
// any ancestor below resolvedRoot) is a symlink pointing outside the
// project, the lexical check in resolveArtifactPath passes but the actual
// write lands outside the sandbox. Throws PathScopeError on escape.
async function assertArtifactsDirSafe(
artifactsDir: string,
resolvedRoot: string,
): Promise<void> {
const realDir = await realpath(artifactsDir);
if (realDir !== resolvedRoot && !realDir.startsWith(resolvedRoot + sep)) {
throw new PathScopeError(
`artifacts dir resolves outside project root: ${realDir}`,
);
}
}
// Pure decision helper for whether finalizeCompletion should write the
// `html_artifact` part. Exported for unit testing the cap-skip branch.
// Returns `{write: true, byteLen}` when the payload is under the cap, or
// `{write: false, byteLen, reason: 'cap_exceeded'}` when oversize.
export type HtmlArtifactDecision =
| { write: true; byteLen: number }
| { write: false; byteLen: number; reason: 'cap_exceeded' };
export function decideHtmlArtifactWrite(
htmlContent: string,
): HtmlArtifactDecision {
const byteLen = Buffer.byteLength(htmlContent, 'utf8');
if (byteLen > HTML_ARTIFACT_MAX_BYTES) {
return { write: false, byteLen, reason: 'cap_exceeded' };
}
return { write: true, byteLen };
}
function buildUrl(projectId: string, filename: string): string {
return `/api/projects/${projectId}/artifacts/${encodeURIComponent(filename)}`;
}
export interface WriteContext {
projectId: string;
projectRoot: string;
}
export async function writeMarkdownArtifact(
message: Pick<Message, 'content'>,
ctx: WriteContext,
): Promise<ArtifactWriteResult> {
const slug = deriveMarkdownSlug(message.content);
const filename = `${slug}-${Date.now()}.md`;
const { resolvedRoot, artifactsDir, absPath } = await resolveArtifactPath(
ctx.projectRoot,
filename,
);
await mkdir(artifactsDir, { recursive: true });
await assertArtifactsDirSafe(artifactsDir, resolvedRoot);
await writeFile(absPath, message.content, 'utf8');
return { path: absPath, url: buildUrl(ctx.projectId, filename) };
}
export async function writeHtmlArtifact(
payload: HtmlArtifactPayload,
ctx: WriteContext,
): Promise<ArtifactWriteResult> {
const slug = deriveHtmlSlug(payload);
const filename = `${slug}-${Date.now()}.html`;
const { resolvedRoot, artifactsDir, absPath } = await resolveArtifactPath(
ctx.projectRoot,
filename,
);
await mkdir(artifactsDir, { recursive: true });
await assertArtifactsDirSafe(artifactsDir, resolvedRoot);
await writeFile(absPath, payload.html_content, 'utf8');
return { path: absPath, url: buildUrl(ctx.projectId, filename) };
}
// 1MB cap on HTML artifacts (proposal S6). Larger payloads are not written
// to the `html_artifact` part — the assistant text lands as plain content
// and a warning is logged. Streaming abort was considered but the graceful
// "no artifact, plain text falls back" path is simpler and lossless from
// the user's perspective.
export const HTML_ARTIFACT_MAX_BYTES = 1_048_576;

View File

@@ -1,3 +1,6 @@
import type { FastifyBaseLogger } from 'fastify';
import { WsFrameSchema, type WsFrame } from '../types/ws-frames.js';
export type Frame = Record<string, unknown> & { type: string };
export type Listener = (frame: Frame) => void;
@@ -6,9 +9,15 @@ export interface Broker {
subscribe(sessionId: string, listener: Listener): () => void;
publishUser(user: string, frame: Frame): void;
subscribeUser(user: string, listener: Listener): () => void;
// v1.13.11-a: typed publish wrappers. Validate against WsFrameSchema and
// delegate to publish / publishUser on success; log + drop on failure
// (fail-closed). Existing publish / publishUser callers stay legal — they
// get converted to the typed variant in v1.13.11-b.
publishFrame(sessionId: string, frame: WsFrame): void;
publishUserFrame(user: string, frame: WsFrame): void;
}
export function createBroker(): Broker {
export function createBroker(log?: FastifyBaseLogger): Broker {
const topics = new Map<string, Set<Listener>>();
const userTopics = new Map<string, Set<Listener>>();
@@ -39,6 +48,28 @@ export function createBroker(): Broker {
};
}
// v1.13.11-a: shared validation guard. Returns the parsed/typed frame on
// success, or null on failure (after logging). Brief mandates fail-closed
// semantics: invalid frames don't reach subscribers; throwing here could
// cascade into stream-phase aborts which v1.13.7 already had to defend
// against, so log + drop is the right shape.
function validate(channel: 'session' | 'user', key: string, frame: WsFrame): WsFrame | null {
const parsed = WsFrameSchema.safeParse(frame);
if (parsed.success) return parsed.data;
const frameType = (frame as { type?: unknown })?.type;
const errors = parsed.error.flatten();
if (log) {
log.error(
{ channel, key, frame_type: frameType, errors },
'ws-frame-validation-failed: dropping invalid frame',
);
} else {
// Fallback for callers that didn't pass a logger (e.g. unit tests).
console.error('ws-frame-validation-failed', { channel, key, frame_type: frameType, errors });
}
return null;
}
return {
publish(sessionId, frame) {
publishTo(topics, sessionId, frame);
@@ -52,5 +83,15 @@ export function createBroker(): Broker {
subscribeUser(user, listener) {
return subscribeTo(userTopics, user, listener);
},
publishFrame(sessionId, frame) {
const valid = validate('session', sessionId, frame);
if (!valid) return;
publishTo(topics, sessionId, valid as Frame);
},
publishUserFrame(user, frame) {
const valid = validate('user', user, frame);
if (!valid) return;
publishTo(userTopics, user, valid as Frame);
},
};
}

View File

@@ -16,7 +16,79 @@
// file parser bug (upstream issue #37) returns a generic error string,
// which we re-surface with a hint to add the file to .codecontextignore.
import { realpath } from 'node:fs/promises';
import { access, copyFile, realpath } from 'node:fs/promises';
import { isAbsolute, join, resolve, sep } from 'node:path';
import { truncateIfNeeded } from './truncate.js';
// v1.13.12 fix: codecontext crashes on empty source files (upstream issue #37)
// when it can't ignore them. The .codecontextignore.template ships with the
// project at /opt/boocode/codecontext/.codecontextignore.template (path inside
// the container; the host's /opt is bind-mounted). On the first call to any
// project, copy the template in if no per-project ignore exists yet. The user
// can subsequently edit the file to customize. Idempotent — once any file is
// at the project root we never overwrite.
const IGNORE_TEMPLATE_PATH = '/opt/boocode/codecontext/.codecontextignore.template';
const ensuredIgnoreProjects = new Set<string>();
async function ensureIgnoreFile(projectRoot: string): Promise<void> {
if (ensuredIgnoreProjects.has(projectRoot)) return;
const ignorePath = join(projectRoot, '.codecontextignore');
try {
await access(ignorePath);
ensuredIgnoreProjects.add(projectRoot);
return;
} catch {
// missing — install the default
}
try {
await copyFile(IGNORE_TEMPLATE_PATH, ignorePath);
ensuredIgnoreProjects.add(projectRoot);
} catch {
// Template missing or project root read-only — proceed without it. The
// codecontext call may still crash on empty source files; the model gets
// the existing hint-message via the catch below telling it to add to
// .codecontextignore manually.
}
}
// v1.13.18: resolve a `file_path` arg to an absolute path anchored within
// the (already realpath'd) projectRoot. Contract:
// - empty/whitespace-only → INVALID_FILE_PATH error
// - relative path → resolve(projectRoot, rawPath) (normalises dot-segments)
// - absolute path → resolve(rawPath) (also normalises — e.g. /root/../etc
// becomes /etc so the prefix-check below rejects it even in the ENOENT
// fallthrough where realpath couldn't canonicalise)
// - try realpath; on ENOENT fall through with the (normalised) absolute
// (the sidecar issues its own "File not found in graph" that the model
// can self-correct on; re-implementing the check here would diverge)
// - if the final path doesn't sit inside projectRoot → escape error
// (same shape as target_dir escape, only the field name differs)
async function resolveProjectPath(
projectRoot: string,
rawPath: string,
): Promise<string> {
if (rawPath.trim() === '') {
throw new Error('INVALID_FILE_PATH: file_path must not be empty');
}
const candidate = isAbsolute(rawPath) ? resolve(rawPath) : resolve(projectRoot, rawPath);
let resolved: string;
try {
resolved = await realpath(candidate);
} catch (err: unknown) {
if ((err as NodeJS.ErrnoException).code === 'ENOENT') {
// File doesn't exist yet (or was deleted). Forward the absolute path;
// codecontext will return "File not found in graph" which the model
// can self-correct on.
resolved = candidate;
} else {
throw err;
}
}
if (resolved !== projectRoot && !resolved.startsWith(projectRoot + sep)) {
throw new Error(`file_path ${rawPath} escapes project root ${projectRoot}`);
}
return resolved;
}
export interface CodecontextRequest {
toolName: string;
@@ -27,6 +99,9 @@ export interface CodecontextRequest {
export interface CodecontextResponse {
result: string;
truncated: boolean;
// v1.13.5: optional opaque id pointing at the full pre-slice content on
// tmpfs. Set when truncated=true and storage succeeded.
outputPath?: string;
}
const CODECONTEXT_BASE_URL = process.env['CODECONTEXT_URL'] ?? 'http://codecontext:8080';
@@ -42,6 +117,10 @@ export async function callCodecontext(
// never pass target_dir; tests can override). A non-existent target_dir
// throws before we hit the network so the model gets a sharp error.
const resolvedProject = await realpath(req.projectPath);
// v1.13.12 fix: install the default .codecontextignore on first call to any
// project so codecontext doesn't crash on empty node_modules files. One file
// written per project, idempotent (set-membership check inside).
await ensureIgnoreFile(resolvedProject);
const requestedTarget = req.args['target_dir'];
const targetDir = typeof requestedTarget === 'string' && requestedTarget.length > 0
? requestedTarget
@@ -56,7 +135,14 @@ export async function callCodecontext(
// Step 2: re-build args with the resolved target_dir so codecontext sees
// the real absolute path, not a symlink or relative form.
const argsToSend = { ...req.args, target_dir: resolvedTarget };
// v1.13.18: also resolve file_path when present — the sidecar index is keyed
// on absolute paths, so a relative path from the model yields "File not found
// in graph". Same escape check as target_dir; ENOENT falls through so the
// sidecar produces the canonical "File not found in graph" the model can fix.
const argsToSend: Record<string, unknown> = { ...req.args, target_dir: resolvedTarget };
if (typeof req.args['file_path'] === 'string' && req.args['file_path'].trim() !== '') {
argsToSend['file_path'] = await resolveProjectPath(resolvedProject, req.args['file_path']);
}
// Step 3: POST with a hard timeout. AbortController + setTimeout pattern
// matches web_fetch.ts; nothing fancier needed.
@@ -105,13 +191,22 @@ export async function callCodecontext(
// Step 4: inline truncation. The model gets a clear hint about how to
// narrow the next call rather than a silent cut. Mirrors web_fetch.ts.
// v1.13.5: stash the full body on tmpfs when truncating so the model can
// retrieve more via view_truncated_output(id).
if (body.result.length > TRUNCATION_LIMIT) {
const truncated = body.result.slice(0, TRUNCATION_LIMIT);
const omitted = body.result.length - TRUNCATION_LIMIT;
const slicedWithMarker =
`${truncated}\n\n[truncated, ${omitted} chars omitted; narrow with file_path, file_type, or limit]`;
const wrapped = await truncateIfNeeded({
fullContent: body.result,
slicedContent: slicedWithMarker,
wasTruncated: true,
});
return {
result:
`${truncated}\n\n[truncated, ${omitted} chars omitted; narrow with file_path, file_type, or limit]`,
truncated: true,
result: wrapped.content,
truncated: wrapped.truncated,
...(wrapped.outputPath ? { outputPath: wrapped.outputPath } : {}),
};
}
return { result: body.result, truncated: false };

View File

@@ -23,7 +23,13 @@ import type { Broker } from './broker.js';
import { SUMMARY_TEMPLATE } from './compaction-prompt.js';
import * as modelContextLookup from './model-context.js';
const COMPACTION_BUFFER = 20_000;
// v1.13.9: ratio-only overflow trigger. Fires compaction at 85% of ctx_max
// (opencode session/overflow.ts pattern). Replaces the v1.11.0-era
// `ctx_max - 20_000` formula which degenerated to 0 for contexts ≤20k and
// gave only 7-8% headroom to the summarizer at 262k. Ratio gives consistent
// 15% headroom at any scale, and small-ctx models no longer get an
// effectively-disabled trigger.
const EARLY_TRIGGER_RATIO = 0.85;
const MIN_PRESERVE_RECENT_TOKENS = 2_000;
const MAX_PRESERVE_RECENT_TOKENS = 8_000;
const DEFAULT_TAIL_TURNS = 2;
@@ -39,19 +45,24 @@ export interface CompactionMessage {
status: 'streaming' | 'complete' | 'failed' | 'cancelled';
tool_calls: Array<{ id: string; name: string; args: Record<string, unknown> }> | null;
tool_results: { tool_call_id: string; output: unknown; truncated: boolean; error?: string } | null;
// v1.13.6: reasoning_parts captured by v1.13.1-C and read back through
// messages_with_parts. Embedded into the head-assembly payload as prose so
// the summarizer LLM sees what the model was reasoning through when it
// chose its tool calls.
reasoning_parts: Array<{ text: string }> | null;
metadata: { kind?: string } | null;
created_at: string;
}
// === overflow ===
// Tokens we hold in reserve for the model's response so a near-full context
// can still produce a useful turn. Mirrors opencode's COMPACTION_BUFFER.
// Returns 0 when the context limit is unknown (caller treats 0 as "do not
// trigger overflow"); avoids dividing-by-zero downstream.
// Returns the token budget at which overflow fires. Triggers compaction at
// 85% of contextLimit (opencode session/overflow.ts pattern). Returns 0 when
// the context limit is unknown caller treats 0 as "do not trigger overflow",
// keeping inference flowing rather than compacting a turn we can't size.
export function usable(contextLimit: number): number {
if (!contextLimit || contextLimit <= 0) return 0;
return Math.max(0, contextLimit - COMPACTION_BUFFER);
return Math.floor(EARLY_TRIGGER_RATIO * contextLimit);
}
export interface Usage {
@@ -197,7 +208,8 @@ export function buildPrompt(
// would silently drop pre-legacy-compact history before the LLM sees it.
// Compaction wants to send the entire head, full stop.) ===
interface OpenAiMessage {
// v1.13.6: exported for unit-test access (reasoning render coverage).
export interface OpenAiMessage {
role: 'system' | 'user' | 'assistant' | 'tool';
content: string | null;
tool_calls?: Array<{
@@ -212,7 +224,8 @@ function isCapHitSentinel(m: CompactionMessage): boolean {
return m.role === 'system' && m.metadata != null && m.metadata.kind === 'cap_hit';
}
function buildHeadPayload(head: CompactionMessage[]): OpenAiMessage[] {
// v1.13.6: exported for unit-test access (reasoning render coverage).
export function buildHeadPayload(head: CompactionMessage[]): OpenAiMessage[] {
const out: OpenAiMessage[] = [];
for (const m of head) {
if (isCapHitSentinel(m)) continue;
@@ -243,9 +256,22 @@ function buildHeadPayload(head: CompactionMessage[]): OpenAiMessage[] {
continue;
}
if (m.role === 'assistant') {
// v1.13.6: embed reasoning text as prose prefixed onto the assistant
// content. OpenAI wire shape doesn't carry reasoning as a structured
// field, but the summarizer is reading text — a tagged prose block
// gives it the same signal. We mirror the AI SDK ReasoningPart shape
// by using a <reasoning>...</reasoning> wrapper so the summarizer can
// distinguish reasoning from user-visible answer.
let body = m.content && m.content.length > 0 ? m.content : '';
if (m.reasoning_parts && m.reasoning_parts.length > 0) {
const reasoning = m.reasoning_parts.map((r) => r.text).join('');
body = body.length > 0
? `<reasoning>${reasoning}</reasoning>\n\n${body}`
: `<reasoning>${reasoning}</reasoning>`;
}
const msg: OpenAiMessage = {
role: 'assistant',
content: m.content && m.content.length > 0 ? m.content : null,
content: body.length > 0 ? body : null,
};
if (m.tool_calls && m.tool_calls.length > 0) {
msg.tool_calls = m.tool_calls.map((tc) => ({
@@ -344,8 +370,11 @@ export async function process(input: ProcessInput): Promise<void> {
// turns() boundary logic sees the same sequence the LLM will.
// v1.13.1-B: reads tool_calls/tool_results via the parts-merged view so
// the compaction payload matches what the LLM saw on the original turn.
// v1.13.6: also pulls reasoning_parts (added in v1.13.1-C) so summaries
// capture what the model was working through before each tool call.
const messages = await sql<CompactionMessage[]>`
SELECT id, role, content, kind, summary, status, tool_calls, tool_results, metadata, created_at
SELECT id, role, content, kind, summary, status, tool_calls, tool_results,
reasoning_parts, metadata, created_at
FROM messages_with_parts
WHERE chat_id = ${chatId} AND compacted_at IS NULL
ORDER BY created_at ASC, id ASC
@@ -402,15 +431,16 @@ export async function process(input: ProcessInput): Promise<void> {
'compaction: invoking model',
);
// 6a. Flip the chat dot amber for the duration of the LLM call + DB writes.
// Same { type: 'chat_status', status: 'working', at } shape inference.ts
// emits at runner enqueue. publishUser → broadcasts on the per-user channel
// (all devices / tabs see it) since chat_status is a user-channel frame in
// BooCode (see useChatStatus.ts, which is the consumer).
broker.publishUser('default', {
// 6a. Flip the chat dot for the duration of the LLM call + DB writes.
// v1.13.11-b: publish status='streaming' (the v1.12.1-widened replacement
// for the dropped 'working' value). Compaction's LLM call has the same
// semantic as an inference turn for dot-state purposes. The v1.12.1
// chat_status widening missed this site; v1.13.11's WsFrame Zod schema
// surfaced the drift via the unknown-enum-value check.
broker.publishUserFrame('default', {
type: 'chat_status',
chat_id: chatId,
status: 'working',
status: 'streaming',
at: new Date().toISOString(),
});
@@ -479,7 +509,7 @@ export async function process(input: ProcessInput): Promise<void> {
// Always restore the dot. Status='idle' (not 'error') even on failure —
// the caller logs/re-surfaces the error separately; the dot doesn't
// need to stay red across reloads for a transient compaction blip.
broker.publishUser('default', {
broker.publishUserFrame('default', {
type: 'chat_status',
chat_id: chatId,
status: 'idle',
@@ -493,7 +523,7 @@ export async function process(input: ProcessInput): Promise<void> {
// toast. Order matters: idle must precede 'compacted' so the dot is
// already green by the time the refetch toast appears.
if (succeeded) {
broker.publish(sessionId, {
broker.publishFrame(sessionId, {
type: 'compacted',
session_id: sessionId,
chat_id: chatId,

View File

@@ -47,8 +47,12 @@ export interface FindFilesResult {
truncated: boolean;
}
export async function listDir(projectRoot: string, relPath: string): Promise<ListDirResult> {
const real = await pathGuard(projectRoot, relPath);
export async function listDir(
projectRoot: string,
relPath: string,
opts?: { extra_roots?: readonly string[] },
): Promise<ListDirResult> {
const real = await pathGuard(projectRoot, relPath, opts?.extra_roots);
const s = await stat(real);
if (!s.isDirectory()) {
throw new PathScopeError(`not a directory: ${relPath}`);
@@ -82,8 +86,12 @@ export async function listDir(projectRoot: string, relPath: string): Promise<Lis
};
}
export async function viewFile(projectRoot: string, relPath: string): Promise<ViewFileResult> {
const real = await pathGuard(projectRoot, relPath);
export async function viewFile(
projectRoot: string,
relPath: string,
opts?: { extra_roots?: readonly string[] },
): Promise<ViewFileResult> {
const real = await pathGuard(projectRoot, relPath, opts?.extra_roots);
const s = await stat(real);
if (!s.isFile()) {
throw new PathScopeError(`not a file: ${relPath}`);
@@ -119,10 +127,10 @@ interface RipgrepMatch {
export async function grep(
projectRoot: string,
pattern: string,
opts?: { path?: string; max_matches?: number; case_sensitive?: boolean; hidden?: boolean }
opts?: { path?: string; max_matches?: number; case_sensitive?: boolean; hidden?: boolean; extra_roots?: readonly string[] }
): Promise<GrepResult> {
const targetPath = opts?.path ?? projectRoot;
const target = await pathGuard(projectRoot, targetPath);
const target = await pathGuard(projectRoot, targetPath, opts?.extra_roots);
const limit = Math.min(
Math.max(opts?.max_matches ?? DEFAULT_GREP_RESULTS, 1),
MAX_GREP_RESULTS
@@ -192,14 +200,14 @@ export async function grep(
export async function findFiles(
projectRoot: string,
pattern?: string,
opts?: { type?: 'file' | 'dir'; max_results?: number; path?: string }
opts?: { type?: 'file' | 'dir'; max_results?: number; path?: string; extra_roots?: readonly string[] }
): Promise<FindFilesResult> {
const limit = Math.min(
Math.max(opts?.max_results ?? DEFAULT_FIND_RESULTS, 1),
MAX_FIND_RESULTS
);
const target = opts?.path != null
? await pathGuard(projectRoot, opts.path)
? await pathGuard(projectRoot, opts.path, opts?.extra_roots)
: projectRoot;
const args = ['--files'];
if (pattern) args.push('--glob', pattern);

View File

@@ -0,0 +1,161 @@
// v1.13.17-cross-repo-reads: derives the grant root for a path the user is
// being asked to approve cross-repo read access to.
//
// Per design decision D1: grant unit = nearest registered project root,
// then nearest path-whitelist ancestor that looks like a repo root, then
// refuse. Granting the literal file path is too narrow (next file in the
// same repo re-prompts). Granting an arbitrary parent dir over-scopes.
//
// The resolver runs in two contexts:
// 1. request_read_access.execute — pre-prompt validation (cheap; bails
// early if the path can't plausibly be granted so the user is never
// asked about /etc/passwd)
// 2. POST /api/chats/:id/grant_read_access — at decision time, re-derives
// the root and persists it on sessions.allowed_read_paths
//
// Sam (2026-05-22 dispatch confirmation): "in the project-root resolver
// ancestor walk, stop the moment parent exits PROJECT_ROOT_WHITELIST or hits
// filesystem root — check on every iteration, not just final parent.
// Symlinked input must not be able to escape the whitelist during the
// walk." Hence the loop here checks both the walk bound AND the still-
// inside-whitelist invariant every step.
import { access, realpath } from 'node:fs/promises';
import { constants } from 'node:fs';
import { dirname, isAbsolute, sep } from 'node:path';
import type { Sql } from '../db.js';
// Files whose presence in a directory marks it as a repo root for grant
// purposes. Kept narrow on purpose; broader heuristics (e.g. ".project",
// "pyproject.toml") can be added with measured intent. Each entry is a
// literal basename — no globs.
const REPO_MARKERS: ReadonlyArray<string> = [
'.git',
'package.json',
'go.mod',
'Cargo.toml',
];
export type GrantResolution =
| { ok: true; root: string; source: 'project' | 'whitelist' }
| { ok: false; reason: string };
function isUnder(child: string, parent: string): boolean {
return child === parent || child.startsWith(parent + sep);
}
async function exists(path: string): Promise<boolean> {
try {
await access(path, constants.F_OK);
return true;
} catch {
return false;
}
}
async function isRepoShaped(dir: string): Promise<boolean> {
for (const marker of REPO_MARKERS) {
if (await exists(`${dir}${sep}${marker}`)) return true;
}
return false;
}
// Resolves an absolute path to its grant root or refuses with a reason
// string suitable for surfacing to the model. Pure helper — no DB writes,
// no broker publishes. Caller persists the root on session.allowed_read_paths
// if it wants the grant to stick.
//
// Arguments:
// sql — used only to read projects.path (no writes)
// requestedPath — absolute path the model wants to read
// projectRoot — the session's primary project root (already
// realpath'd by caller). Used to short-circuit
// "already in scope".
// whitelistRoot — PROJECT_ROOT_WHITELIST from config (default /opt).
// Walk bound for the repo-shape fallback.
//
// Returns { ok: true, root, source } on success; { ok: false, reason } else.
export async function resolveGrantRoot(
sql: Sql,
requestedPath: string,
projectRoot: string,
whitelistRoot: string,
): Promise<GrantResolution> {
if (typeof requestedPath !== 'string' || requestedPath.length === 0) {
return { ok: false, reason: 'path is required' };
}
if (!isAbsolute(requestedPath)) {
return { ok: false, reason: 'path must be absolute' };
}
// Resolve symlinks so subsequent ancestor checks compare apples-to-apples
// with realpath'd projectRoot. If the path doesn't exist at all, bail
// before bothering the user — the model is asking about a phantom.
let real: string;
try {
real = await realpath(requestedPath);
} catch {
return { ok: false, reason: `path does not exist: ${requestedPath}` };
}
// Whitelist guard. Symlinked inputs can resolve outside the whitelist
// even when the surface-form path looks inside it; that's why we test
// the *real* path here, not the requested one.
let realWhitelist: string;
try {
realWhitelist = await realpath(whitelistRoot);
} catch {
return { ok: false, reason: `whitelist root does not exist: ${whitelistRoot}` };
}
if (!isUnder(real, realWhitelist)) {
return { ok: false, reason: 'path outside permitted scope' };
}
// Already in scope? No prompt needed; the tool's caller should retry.
if (isUnder(real, projectRoot)) {
return { ok: false, reason: 'path already accessible without a grant' };
}
// Look for a registered project whose root is an ancestor of the
// requested path. Pick the LONGEST match (nearest ancestor wins) so
// sub-projects don't get over-broadened.
const projectRows = await sql<{ path: string }[]>`
SELECT path FROM projects WHERE status = 'open'
`;
let bestProject: string | null = null;
for (const row of projectRows) {
if (!row.path) continue;
if (!isUnder(real, row.path)) continue;
if (bestProject === null || row.path.length > bestProject.length) {
bestProject = row.path;
}
}
if (bestProject !== null) {
return { ok: true, root: bestProject, source: 'project' };
}
// Repo-shape fallback. Walk from the requested path upward toward the
// whitelist root. At every iteration: confirm we're still inside the
// whitelist (so a symlinked component can't slip the bound mid-walk)
// and confirm we haven't hit the filesystem root. The first dir with a
// REPO_MARKER child is the grant root.
let cursor = real;
while (true) {
// Don't grant the whitelist root itself — that would be far too broad.
if (cursor === realWhitelist) {
return { ok: false, reason: 'no repo-shaped ancestor found under whitelist' };
}
if (!isUnder(cursor, realWhitelist)) {
return { ok: false, reason: 'path outside permitted scope' };
}
const parent = dirname(cursor);
if (parent === cursor) {
// Hit filesystem root without finding a repo marker.
return { ok: false, reason: 'no repo-shaped ancestor found under whitelist' };
}
if (await isRepoShaped(cursor)) {
return { ok: true, root: cursor, source: 'whitelist' };
}
cursor = parent;
}
}

View File

@@ -3,12 +3,24 @@ import { READ_ONLY_TOOL_NAMES } from '../tools.js';
// v1.8.2: tool-call budget defaults. Resolved per-turn by resolveToolBudget.
// - Agent with explicit max_tool_calls: that value.
// - Agent with read-only-only tools: BUDGET_READ_ONLY (30).
// - Agent with read-only-only tools: BUDGET_READ_ONLY (50).
// - Agent with any non-read-only tool: BUDGET_NON_READ_ONLY (10).
// - No agent (raw chat): BUDGET_NO_AGENT (15).
export const BUDGET_READ_ONLY = 30;
// - No agent (raw chat): BUDGET_NO_AGENT (50).
// v1.13.7: bumped BUDGET_NO_AGENT 15→30 to match BUDGET_READ_ONLY. Every tool
// in ALL_TOOLS today is read-only (see services/tools.ts comment at
// READ_ONLY_TOOL_NAMES); the cautious 15-cap was a forward-looking guard for
// write tools that haven't landed yet. No-agent mode gets the same toolset as
// an all-read-only agent at runtime, so they should share the same budget.
// v1.13.12: bumped read-only caps 30→50. Real recon sessions were hitting 30
// with ~3 turns wasted on codecontext parse failures (empty node_modules
// files); legitimate need was ~27, and Architect-class system overviews want
// deeper recon than a 30-cap permits. Headroom of 20 absorbs failure-retry
// turns + deeper exploration without changing the safety floor materially —
// the doom-loop guard (3 identical calls → abort) catches the actual failure
// mode this cap was guarding against.
export const BUDGET_READ_ONLY = 50;
export const BUDGET_NON_READ_ONLY = 10;
export const BUDGET_NO_AGENT = 15;
export const BUDGET_NO_AGENT = 50;
const READ_ONLY_SET: ReadonlySet<string> = new Set(READ_ONLY_TOOL_NAMES);

View File

@@ -1,7 +1,14 @@
import type { MessageMetadata, Session } from '../../types/api.js';
import {
decideHtmlArtifactWrite,
detectHtmlArtifact,
deriveHtmlTitle,
HTML_ARTIFACT_MAX_BYTES,
} from '../artifacts.js';
import * as modelContext from '../model-context.js';
import { maybeFlagForCompaction } from './payload.js';
import { insertParts, partsFromAssistantMessage } from './parts.js';
import type { PartInsert } from './parts.js';
import type { InferenceContext, StreamResult, TurnArgs } from './turn.js';
export async function handleAbortOrError(
@@ -120,17 +127,42 @@ export async function finalizeCompletion(
// a kind='reasoning' part alongside the text.
// TODO(v1.13.1): wrap the UPDATE above and this insertParts in a single
// sql.begin before flipping read authority to message_parts.
await insertParts(
ctx.sql,
partsFromAssistantMessage({
content,
tool_calls: null,
reasoning: result.reasoning,
}).map((p) => ({
...p,
message_id: assistantMessageId,
})),
);
const baseParts: PartInsert[] = partsFromAssistantMessage({
content,
tool_calls: null,
reasoning: result.reasoning,
}).map((p) => ({
...p,
message_id: assistantMessageId,
}));
// v1.14.x-html-artifact-panes: opportunistic HTML detection. Adds a
// SIBLING html_artifact part — never replaces the text part. 1MB cap is
// graceful: oversized payloads are skipped and the assistant message
// lands as plain content (warn logged).
const htmlContent = detectHtmlArtifact(content);
if (htmlContent !== null) {
const decision = decideHtmlArtifactWrite(htmlContent);
if (!decision.write) {
ctx.log.warn(
{ assistantMessageId, byteLen: decision.byteLen, cap: HTML_ARTIFACT_MAX_BYTES },
'html_artifact exceeded 1MB cap; skipping artifact part',
);
} else {
const title = deriveHtmlTitle(htmlContent);
const nextSeq = baseParts.reduce((m, p) => Math.max(m, p.sequence), -1) + 1;
baseParts.push({
message_id: assistantMessageId,
sequence: nextSeq,
kind: 'html_artifact',
payload: {
html_content: htmlContent,
char_count: htmlContent.length,
title,
},
});
}
}
await insertParts(ctx.sql, baseParts);
// v1.11: flag for compaction on the terminal turn too. Catches the common
// case of a turn that hit the limit without invoking tools.
await maybeFlagForCompaction(ctx, chatId, updated);

View File

@@ -7,7 +7,20 @@ import type { ToolCall, ToolResult } from '../../types/api.js';
// JSON columns; the swap to parts-as-source-of-truth happens in a later
// v1.13 dispatch alongside the AI SDK streamText migration.
export type PartKind = 'text' | 'tool_call' | 'tool_result' | 'reasoning' | 'step_start';
// v1.13.13: 'synthesis' added. Schema CHECK constraint is updated in lockstep
// (schema.sql adds 'synthesis' to message_parts_kind_chk on startup). The
// dispatch's claim that no schema migration was needed assumed kind was a
// bare text column — it isn't; the constraint enumerates allowed values.
// v1.14.x-html-artifact-panes: 'html_artifact' added. Schema CHECK constraint
// in schema.sql updated in lockstep.
export type PartKind =
| 'text'
| 'tool_call'
| 'tool_result'
| 'reasoning'
| 'step_start'
| 'synthesis'
| 'html_artifact';
export interface PartInsert {
message_id: string;

View File

@@ -1,3 +1,4 @@
import type { FastifyBaseLogger } from 'fastify';
import type { Sql } from '../../db.js';
import type {
Agent,
@@ -6,8 +7,9 @@ import type {
Session,
} from '../../types/api.js';
import * as compaction from '../compaction.js';
import { buildSystemPrompt } from '../system-prompt.js';
import { buildSystemPromptWithFingerprint } from '../system-prompt.js';
import { isAnySentinel } from './sentinels.js';
import { PRUNE_TRIGGER_TOKENS, prune } from './prune.js';
import type { InferenceContext } from './turn.js';
export interface OpenAiMessage {
@@ -30,14 +32,25 @@ export interface OpenAiMessage {
// v1.12: buildSystemPrompt lives in services/system-prompt.ts. It awaits the
// container-guidance loader, so this function is async too and every call
// site in inference.ts awaits the result.
// v1.13.8: optional log argument. When provided, emit prefix-fingerprint
// per call + prefix-drift when the same session sees a hash change. Tests
// omit it and exercise the byte-stability surface directly through
// buildSystemPromptWithFingerprint. The observer Map in system-prompt.ts
// updates regardless of whether log is passed.
export async function buildMessagesPayload(
session: Session,
project: Project,
history: Message[],
agent: Agent | null = null
agent: Agent | null = null,
log?: FastifyBaseLogger,
): Promise<OpenAiMessage[]> {
const out: OpenAiMessage[] = [];
const systemPrompt = await buildSystemPrompt(project, session, agent);
const { prompt: systemPrompt, fingerprint, drift } =
await buildSystemPromptWithFingerprint(project, session, agent);
if (log) {
log.info(fingerprint);
if (drift) log.warn(drift);
}
out.push({ role: 'system', content: systemPrompt });
// Find the latest compact marker — only send messages from that point onwards
@@ -62,6 +75,25 @@ export async function buildMessagesPayload(
if (isAnySentinel(m)) continue;
if (m.role === 'assistant' && m.status === 'streaming') continue;
if (m.role === 'assistant' && m.status === 'cancelled') continue;
// v1.13.7: skip failed assistant turns. A failed row carries no usable
// content for the model, and leaving it in the payload alongside any
// following assistant message produces "Cannot have 2 or more assistant
// messages at the end of the list" from the OpenAI-compatible upstream.
if (m.role === 'assistant' && m.status === 'failed') continue;
// v1.13.7: skip "empty" completed assistants — clen=0 + no tool_calls.
// These can land when an upstream stream returns finishReason='stop' with
// no text/tool output (network blip, rate limit recovery, model quirk).
// Same risk as the failed-status case: a trailing empty assistant plus
// the next attempt's assistant placeholder = two trailing assistants and
// the API rejects the whole payload.
if (
m.role === 'assistant' &&
m.status === 'complete' &&
(m.content == null || m.content.trim().length === 0) &&
(m.tool_calls == null || m.tool_calls.length === 0)
) {
continue;
}
if (m.role === 'tool') {
const tr = m.tool_results;
if (!tr) continue;
@@ -166,6 +198,29 @@ export async function maybeFlagForCompaction(
contextLimit,
);
if (!overflow) return;
// v1.13.4: try the cheap prune first. If it freed at least
// PRUNE_TRIGGER_TOKENS (20k) worth of context, we're below the threshold
// again — skip flagging summarize for the next turn. The next turn's
// overflow check will re-evaluate from scratch.
// v1.13.9: the overflow trigger above is now 85% of ctx_max (was
// ctx_max - 20k). PRUNE_TRIGGER_TOKENS stays at 20k as the prune-freed
// threshold — independent of the overflow formula.
// Prune failures (DB errors etc.) propagate so the surrounding inference
// path sees them; the catch in finalizeCompletion / executeToolPhase
// doesn't shield this — by design, we want to know if prune is broken.
const pruned = await prune({ sql: ctx.sql, chatId });
if (pruned.hidden > 0) {
ctx.log.info(
{ chatId, hidden: pruned.hidden, freedTokens: pruned.freedTokens },
'inference: prune freed context budget',
);
}
if (pruned.freedTokens >= PRUNE_TRIGGER_TOKENS) {
// Prune handled it; skip the (expensive) summarize path.
return;
}
await ctx.sql`UPDATE chats SET needs_compaction = true WHERE id = ${chatId}`;
ctx.log.info({ chatId, promptTokens, completionTokens, contextLimit }, 'inference: flagged for compaction');
}

View File

@@ -15,6 +15,14 @@ function getProvider(baseURL: string): ReturnType<typeof createOpenAICompatible>
provider = createOpenAICompatible({
name: 'llama-swap',
baseURL: baseURL.endsWith('/v1') ? baseURL : `${baseURL}/v1`,
// v1.13.7: @ai-sdk/openai-compatible defaults includeUsage=false, which
// omits `stream_options.include_usage` from the request body. Without
// it, llama.cpp / llama-swap never emits the trailing usage block, so
// `result.usage` resolves with inputTokens=outputTokens=undefined and
// tokens_used / ctx_used land as NULL in every messages row. Setting
// true here re-enables the per-stream usage payload across all models
// served via the llama-swap provider.
includeUsage: true,
});
cache.set(baseURL, provider);
}

View File

@@ -0,0 +1,127 @@
import type { Sql } from '../../db.js';
// v1.13.4: two-tier compaction prune. Opencode's prune half (the cheap one);
// summarize half shipped in v1.11.0 as services/compaction.ts.
//
// Algorithm: scan tool_result parts newest-first. Protect the last
// PROTECTED_TOKENS of content (the model recently saw these — pruning them
// kills coherence). Older parts are candidates. Mark them hidden_at only
// if the candidate pool would free at least PRUNE_TRIGGER_TOKENS — pruning
// 3 small tool_results to recover 500 tokens isn't worth the loss of
// fidelity for the model's next turn.
//
// Stops at the last compaction summary boundary (chats.tail_start_id). The
// v1.11.0 summary already encodes everything before that point; pruning
// across the boundary would double-erase.
export const PROTECTED_TOKENS = 40_000;
export const PRUNE_TRIGGER_TOKENS = 20_000;
// Rough char-to-token estimate. Same heuristic compaction's usable() uses
// implicitly via the buffer constant.
function estimateTokens(text: string): number {
return Math.ceil(text.length / 4);
}
function payloadTokens(payload: unknown): number {
return estimateTokens(JSON.stringify(payload ?? ''));
}
export interface PruneResult {
hidden: number;
freedTokens: number;
}
// Pure algorithmic core, exported for unit-test access. Takes parts already
// ordered newest-first, plus an optional cutoff (last compaction summary
// boundary). Returns the part ids to hide and the total token estimate of
// the candidates. Caller does the DB UPDATE.
export interface PartForPrune {
id: string;
payload: unknown;
created_at: Date;
}
export function selectPruneTargets(
partsNewestFirst: ReadonlyArray<PartForPrune>,
tailStartCreatedAt: Date | null,
): { ids: string[]; freedTokens: number } {
let protectedTokens = 0;
const candidates: { id: string; tokens: number }[] = [];
let crossedProtection = false;
for (const part of partsNewestFirst) {
if (tailStartCreatedAt && part.created_at < tailStartCreatedAt) {
// Past the last summary boundary; the v1.11.0 anchored summary already
// covers everything older. Bail rather than double-erase.
break;
}
const tokens = payloadTokens(part.payload);
if (!crossedProtection) {
protectedTokens += tokens;
if (protectedTokens >= PROTECTED_TOKENS) {
crossedProtection = true;
}
continue;
}
candidates.push({ id: part.id, tokens });
}
const candidateTokens = candidates.reduce((s, c) => s + c.tokens, 0);
if (candidates.length === 0 || candidateTokens < PRUNE_TRIGGER_TOKENS) {
return { ids: [], freedTokens: 0 };
}
return { ids: candidates.map((c) => c.id), freedTokens: candidateTokens };
}
export async function prune(args: {
sql: Sql;
chatId: string;
}): Promise<PruneResult> {
const { sql, chatId } = args;
// Newest-first scan of visible tool_result parts in this chat. Pull
// chats.tail_start_id alongside so we know where the last summary boundary
// sits (don't prune across it).
const parts = await sql<{
id: string;
payload: unknown;
created_at: Date;
tail_start_id: string | null;
}[]>`
SELECT p.id, p.payload, m.created_at,
(SELECT c.tail_start_id FROM chats c WHERE c.id = ${chatId}) AS tail_start_id
FROM message_parts p
JOIN messages m ON m.id = p.message_id
WHERE m.chat_id = ${chatId}
AND p.kind = 'tool_result'
AND p.hidden_at IS NULL
ORDER BY m.created_at DESC, p.sequence DESC
`;
if (parts.length === 0) {
return { hidden: 0, freedTokens: 0 };
}
// Read the boundary cutoff timestamp once. Older messages are off-limits.
let tailStartCreatedAt: Date | null = null;
const firstTailId = parts[0]?.tail_start_id ?? null;
if (firstTailId) {
const tailRow = await sql<{ created_at: Date }[]>`
SELECT created_at FROM messages WHERE id = ${firstTailId}
`;
tailStartCreatedAt = tailRow[0]?.created_at ?? null;
}
const decision = selectPruneTargets(parts, tailStartCreatedAt);
if (decision.ids.length === 0) {
return { hidden: 0, freedTokens: 0 };
}
await sql`
UPDATE message_parts
SET hidden_at = clock_timestamp()
WHERE id = ANY(${decision.ids})
`;
return { hidden: decision.ids.length, freedTokens: decision.freedTokens };
}

View File

@@ -36,7 +36,7 @@ export async function runCapHitSummary(
): Promise<void> {
const { sessionId, chatId, assistantMessageId, signal } = args;
const messages = await buildMessagesPayload(session, project, history, agent);
const messages = await buildMessagesPayload(session, project, history, agent, ctx.log);
messages.push({ role: 'system', content: CAP_HIT_SUMMARY_NOTE(budget) });
const startedRow = await ctx.sql<{ started_at: string }[]>`
@@ -298,7 +298,7 @@ export async function runDoomLoopSummary(
): Promise<void> {
const { sessionId, chatId, assistantMessageId, signal } = args;
const messages = await buildMessagesPayload(session, project, history, agent);
const messages = await buildMessagesPayload(session, project, history, agent, ctx.log);
messages.push({ role: 'system', content: DOOM_LOOP_NOTE(loop.name) });
const startedRow = await ctx.sql<{ started_at: string }[]>`

View File

@@ -6,12 +6,9 @@ import type {
import * as modelContext from '../model-context.js';
import { toolJsonSchemas, type ToolJsonSchema } from '../tools.js';
import type { OpenAiMessage } from './payload.js';
import {
XML_TOOL_CLOSE,
XML_TOOL_OPEN,
parseXmlToolCall,
partialXmlOpenerStart,
} from './xml-parser.js';
// v1.13.16: extractToolCallBlocks replaces the inline opener-search loop and
// recognizes both Qwen <tool_call> and Anthropic <invoke> markup in one pass.
import { extractToolCallBlocks } from './xml-parser.js';
import { DB_FLUSH_INTERVAL_MS, type StreamPhaseState } from './types.js';
import type {
InferenceContext,
@@ -132,16 +129,24 @@ function buildAiTools(schemas: ToolJsonSchema[]): Record<string, ReturnType<type
// v1.10.5 Qwen-coder XML fallback. Some local models (notably qwen3-coder via
// llama-swap) emit tool calls as inline XML inside delta.content rather than
// the structured tool_calls field. We extract them out of the streamed text
// before flushing it to the client, mirroring the pre-AI-SDK behavior.
// before flushing it to the client.
//
// XML shape:
// Qwen shape:
// <tool_call>
// <function=NAME>
// <parameter=KEY>VALUE</parameter>
// ...
// </function>
// </tool_call>
// Multiple <tool_call> blocks may appear back-to-back; they never nest.
//
// v1.13.16: also recognize Anthropic <invoke> markup that qwen3.6-35b-a3b-mxfp4
// drifts to (training-data residue from Claude Code documentation):
// <invoke name="NAME">
// <parameter name="KEY">VALUE</parameter>
// </invoke>
// Both formats share the synthetic xml_call_${idx} ID space; the counter
// increments across whichever opener appears first. Multiple blocks may
// appear back-to-back in either format and they never nest.
export async function streamCompletion(
ctx: InferenceContext,
model: string,
@@ -209,47 +214,24 @@ export async function streamCompletion(
switch (part.type) {
case 'text-delta': {
pendingBuffer += part.text;
// Extract any complete <tool_call>...</tool_call> blocks before
// flushing visible text.
while (true) {
const startIdx = pendingBuffer.indexOf(XML_TOOL_OPEN);
if (startIdx === -1) break;
const closeIdx = pendingBuffer.indexOf(XML_TOOL_CLOSE, startIdx);
if (closeIdx === -1) break;
const blockEnd = closeIdx + XML_TOOL_CLOSE.length;
const block = pendingBuffer.slice(startIdx, blockEnd);
if (startIdx > 0) {
const before = pendingBuffer.slice(0, startIdx);
content += before;
onDelta(before);
}
const parsedCall = parseXmlToolCall(block);
if (parsedCall) {
const synthIdx = toolCalls.length;
toolCalls.push({
id: `xml_call_${synthIdx}`,
name: parsedCall.name,
args: parsedCall.args,
});
}
// Parse failures still drop the block — leaking <tool_call> XML to
// the chat would look worse than silently swallowing the bad block.
pendingBuffer = pendingBuffer.slice(blockEnd);
// v1.13.16: unified extraction. The helper finds the earliest-opening
// complete <tool_call> or <invoke> block, flushes prose between/around
// them, holds any partial opener for the next chunk, and silently
// drops blocks that fail to parse (matches pre-v1.13.16 behavior).
const extracted = extractToolCallBlocks(pendingBuffer);
if (extracted.flushed.length > 0) {
content += extracted.flushed;
onDelta(extracted.flushed);
}
// Hold back any (partial or full) unclosed opener; flush the rest.
const partialIdx = partialXmlOpenerStart(pendingBuffer);
if (partialIdx >= 0) {
if (partialIdx > 0) {
const flush = pendingBuffer.slice(0, partialIdx);
content += flush;
onDelta(flush);
}
pendingBuffer = pendingBuffer.slice(partialIdx);
} else if (pendingBuffer.length > 0) {
content += pendingBuffer;
onDelta(pendingBuffer);
pendingBuffer = '';
for (const call of extracted.calls) {
const synthIdx = toolCalls.length;
toolCalls.push({
id: `xml_call_${synthIdx}`,
name: call.name,
args: call.args,
});
}
pendingBuffer = extracted.remaining;
break;
}
case 'tool-call': {

View File

@@ -4,6 +4,16 @@ import { PathScopeError } from '../path_guard.js';
import { TOOLS_BY_NAME } from '../tools.js';
import { maybeFlagForCompaction } from './payload.js';
import { insertParts, partsFromAssistantMessage, partsFromToolMessage } from './parts.js';
// v1.13.16: richer unknown-tool error so the model can self-correct when it
// drifts to a Claude Code tool name (e.g. read_file → suggest view_file).
// Applies to all unknown tool names, not just <invoke>-derived ones — at the
// dispatch layer we no longer know which format produced the call, and the
// extra signal is harmless for Qwen-derived calls.
import { formatUnknownToolError } from './tool-suggestions.js';
// v1.13.17-cross-repo-reads: pre-prompt validation for request_read_access.
// Resolves the grant root before pausing the loop so the user is never
// prompted about paths we couldn't grant anyway (e.g. /etc/passwd).
import { resolveGrantRoot } from '../grant_resolver.js';
import type {
InferenceContext,
StreamResult,
@@ -14,14 +24,24 @@ import type {
// the reference is read at call time (inside an async function body), not
// at module top-level. Node + tsc resolve this cleanly.
import { runAssistantTurn } from './turn.js';
// v1.13.13: synthesis pipeline — replaces the immediate recursive turn when
// any of this batch's tool calls is in SYNTHESIS_TOOLS. Falls through to
// recursion on synthesis failure (timeout / model error). See module header
// in synthesisPipeline.ts for the auto-fetch + token-budget rules.
import { SYNTHESIS_TOOLS, runSynthesisPass } from '../synthesisPipeline.js';
async function executeToolCall(
projectRoot: string,
toolCall: ToolCall
toolCall: ToolCall,
extraRoots: readonly string[],
): Promise<{ output: unknown; truncated: boolean; error?: string }> {
const tool = TOOLS_BY_NAME[toolCall.name];
if (!tool) {
return { output: null, truncated: false, error: `unknown tool: ${toolCall.name}` };
return {
output: null,
truncated: false,
error: formatUnknownToolError(toolCall.name, Object.keys(TOOLS_BY_NAME)),
};
}
const parsed = tool.inputSchema.safeParse(toolCall.args);
if (!parsed.success) {
@@ -48,7 +68,7 @@ async function executeToolCall(
};
}
try {
const output = await tool.execute(parsed.data, projectRoot);
const output = await tool.execute(parsed.data, projectRoot, extraRoots);
const truncated =
typeof output === 'object' && output !== null && 'truncated' in output
? Boolean((output as { truncated: unknown }).truncated)
@@ -155,6 +175,12 @@ export async function executeToolPhase(
// batches still execute the other tools normally.
ctx.publishUser({ type: 'chat_status', chat_id: chatId, status: 'tool_running', at: new Date().toISOString() });
let pausingForUserInput = false;
// v1.13.13: capture synth-tool result text so the synthesis pipeline below
// doesn't have to re-fetch from DB. Array (not single) because a batch
// could theoretically include multiple synthesis tools — we take the first
// for the synthesis input. Race-free under Promise.all because each
// callback pushes its own captured value.
const synthEntries: Array<{ tc: ToolCall; output: unknown; error?: string }> = [];
await Promise.all(
toolCalls.map(async (tc) => {
const [toolRow] = await ctx.sql<{ id: string }[]>`
@@ -185,7 +211,74 @@ export async function executeToolPhase(
);
return;
}
const tres = await executeToolCall(projectRoot, tc);
// v1.13.17-cross-repo-reads: request_read_access pauses identically to
// ask_user_input EXCEPT for an up-front validation pass — if the path
// can't be granted under the whitelist / repo-shape rules, surface an
// immediate denial without prompting the user. Per design D1, we never
// ask the user about /etc/passwd or paths outside PROJECT_ROOT_WHITELIST.
if (tc.name === 'request_read_access') {
const tcArgs = tc.args as { path?: unknown; reason?: unknown };
const requested =
typeof tcArgs.path === 'string' ? tcArgs.path : '';
const resolution = await resolveGrantRoot(
ctx.sql,
requested,
projectRoot,
ctx.config.PROJECT_ROOT_WHITELIST,
);
if (!resolution.ok) {
// Auto-deny without pausing. The model sees the reason on its
// next turn and decides what to do.
const stored = {
tool_call_id: tc.id,
output: `denied: ${resolution.reason}`,
truncated: false,
};
await ctx.sql`
UPDATE messages
SET tool_results = ${ctx.sql.json(stored as never)}
WHERE id = ${toolMessageId}
`;
await insertParts(
ctx.sql,
partsFromToolMessage({ tool_results: stored }).map((p) => ({
...p,
message_id: toolMessageId,
})),
);
ctx.publish(sessionId, {
type: 'tool_result',
tool_message_id: toolMessageId,
chat_id: chatId,
tool_call_id: tc.id,
output: stored.output,
truncated: false,
});
return;
}
// Path is plausibly grantable — install the pending sentinel and
// pause. The grant endpoint re-derives the root at decision time
// (state may have changed in the meantime) so we don't stash it here.
pausingForUserInput = true;
const sentinel = { tool_call_id: tc.id, output: null, truncated: false };
await ctx.sql`
UPDATE messages
SET tool_results = ${ctx.sql.json(sentinel as never)}
WHERE id = ${toolMessageId}
`;
await insertParts(
ctx.sql,
partsFromToolMessage({ tool_results: sentinel }).map((p) => ({
...p,
message_id: toolMessageId,
})),
);
return;
}
const tres = await executeToolCall(projectRoot, tc, session.allowed_read_paths);
if (SYNTHESIS_TOOLS.has(tc.name)) {
synthEntries.push({ tc, output: tres.output, ...(tres.error ? { error: tres.error } : {}) });
}
const stored = {
tool_call_id: tc.id,
output: tres.output,
@@ -233,6 +326,41 @@ export async function executeToolPhase(
return;
}
// v1.13.13: synthesis-pipeline branch. When any of this batch's tool calls
// is a codecontext overview/analysis tool that produced a non-error result,
// run a forced second-inference synthesis pass with auto-fetched files +
// project docs instead of the normal recursive runAssistantTurn. Falls
// through to the recursive call on synthesis failure (timeout, model
// error). User-abort re-throws so the outer handler runs.
const synthEntry = synthEntries.find((e) => !e.error && e.output != null);
if (synthEntry) {
// codecontext wrappers return { result: string, truncated: boolean, ... }.
// Defensive: stringify the output if it isn't the expected shape so the
// synthesis still has something to chew on rather than crashing on
// missing `.result`.
const out = synthEntry.output as { result?: unknown; truncated?: boolean; outputPath?: string };
const toolResultText =
typeof out?.result === 'string'
? out.result
: JSON.stringify(synthEntry.output);
// v1.13.15-b: forward the wrapper's truncation flag + opaque tmpfs id so
// synthesisPipeline can re-read the full content for reference extraction.
const ran = await runSynthesisPass({
ctx,
args,
session,
projectRoot,
toolName: synthEntry.tc.name,
toolResultText,
...(typeof out?.truncated === 'boolean' ? { truncated: out.truncated } : {}),
...(typeof out?.outputPath === 'string' ? { outputPath: out.outputPath } : {}),
});
if (ran) return;
// ran === false → synthesis failed (timeout / model error) → fall through
// to the standard recursive turn below. The synth message (if created)
// was already marked status='failed' inside runSynthesisPass.
}
const [nextAssistant] = await ctx.sql<{ id: string }[]>`
INSERT INTO messages (session_id, chat_id, role, content, status, created_at)
VALUES (${sessionId}, ${chatId}, 'assistant', '', 'streaming', clock_timestamp())

View File

@@ -0,0 +1,63 @@
// v1.13.16: Levenshtein + suggestion + formatter for the unknown-tool error
// returned to the model when an XML-extracted tool call references a name
// that isn't in TOOLS_BY_NAME. The drift incident this targets: qwen3.6
// emitting <invoke name="read_file"> from its Claude Code training residue
// when BooCode's actual file-read tool is view_file. Hand-rolled distance
// function — no new dep.
export function levenshtein(a: string, b: string): number {
if (a.length === 0) return b.length;
if (b.length === 0) return a.length;
const dp: number[][] = Array.from(
{ length: a.length + 1 },
() => new Array<number>(b.length + 1).fill(0),
);
for (let i = 0; i <= a.length; i++) dp[i]![0] = i;
for (let j = 0; j <= b.length; j++) dp[0]![j] = j;
for (let i = 1; i <= a.length; i++) {
for (let j = 1; j <= b.length; j++) {
const cost = a[i - 1] === b[j - 1] ? 0 : 1;
dp[i]![j] = Math.min(
dp[i - 1]![j]! + 1,
dp[i]![j - 1]! + 1,
dp[i - 1]![j - 1]! + cost,
);
}
}
return dp[a.length]![b.length]!;
}
// Threshold per the v1.13.16 dispatch: distance <= 3 OR substring match
// (either direction). Ties broken by smallest distance, then alphabetical.
export function suggestToolName(
name: string,
available: readonly string[],
): string | null {
const lower = name.toLowerCase();
let best: { name: string; dist: number } | null = null;
for (const tool of available) {
const tlower = tool.toLowerCase();
const dist = levenshtein(lower, tlower);
const isSubstr = tlower.includes(lower) || lower.includes(tlower);
if (dist > 3 && !isSubstr) continue;
if (
best === null ||
dist < best.dist ||
(dist === best.dist && tool.localeCompare(best.name) < 0)
) {
best = { name: tool, dist };
}
}
return best?.name ?? null;
}
export function formatUnknownToolError(
name: string,
available: readonly string[],
): string {
const sorted = [...available].sort();
const suggestion = suggestToolName(name, sorted);
const list = sorted.join(', ');
const tail = suggestion ? ` Did you mean: ${suggestion}?` : '';
return `Tool '${name}' not found. Available tools: [${list}].${tail}`;
}

View File

@@ -205,7 +205,7 @@ export async function runAssistantTurn(
return;
}
const messages = await buildMessagesPayload(session, project, history, agent);
const messages = await buildMessagesPayload(session, project, history, agent, ctx.log);
// v1.11.8: resolve per-chat web-tools opt-in. Tri-state on the wire:
// - session.web_search_enabled = null → inherit project default

View File

@@ -1,23 +1,42 @@
// v1.10.5: XML-tag tool-call fallback. Some models emit
// <tool_call><function=foo><parameter=key>value</parameter></function></tool_call>
// in plain content instead of using the OpenAI tool_calls JSON channel.
// The streaming loop in inference.ts extracts these blocks via these helpers.
// The streaming loop in stream-phase.ts extracts these blocks via these helpers.
//
// v1.13.16: also recognize Anthropic <invoke name="..."><parameter name="...">
// markup. qwen3.6-35b-a3b-mxfp4 drifts to this format when prompted as an
// "Architect"-style agent because Claude Code documentation in its
// pre-training data uses this shape. Both formats route through the same
// synthetic ToolCall path with shared xml_call_${idx} IDs; downstream
// dispatch handles unknown tool names with a richer error (see
// tool-suggestions.ts + tool-phase.ts).
export const XML_TOOL_OPEN = '<tool_call>';
export const XML_TOOL_CLOSE = '</tool_call>';
export function parseXmlToolCall(
block: string,
): { name: string; args: Record<string, unknown> } | null {
const nameMatch = block.match(/<function=([^>]+)>/);
// v1.13.16: Anthropic <invoke> opener is matched by prefix (not the full
// `<invoke ...>` tag) because attributes follow. Closer is the literal tag.
export const INVOKE_TOOL_OPEN = '<invoke';
export const INVOKE_TOOL_CLOSE = '</invoke>';
export interface ParsedCall {
name: string;
args: Record<string, unknown>;
}
// v1.10.5: Qwen-flavor parser. Tightened in v1.13.16 to tolerate whitespace
// around `=` (e.g. `<function = view_file>`). Name capture is non-whitespace,
// non-`>` so a stray space doesn't get absorbed into the function name.
const QWEN_FUNCTION_RE = /<function\s*=\s*([^>\s]+)\s*>/;
const QWEN_PARAM_RE = /<parameter\s*=\s*([^>\s]+)\s*>([\s\S]*?)<\/parameter>/g;
export function parseXmlToolCall(block: string): ParsedCall | null {
const nameMatch = block.match(QWEN_FUNCTION_RE);
if (!nameMatch || !nameMatch[1]) return null;
const name = nameMatch[1].trim();
if (!name) return null;
const args: Record<string, unknown> = {};
// Non-greedy body so each <parameter=…>…</parameter> pair is matched
// independently even when multiple appear in the same block.
const paramRe = /<parameter=([^>]+)>([\s\S]*?)<\/parameter>/g;
for (const m of block.matchAll(paramRe)) {
for (const m of block.matchAll(QWEN_PARAM_RE)) {
const key = (m[1] ?? '').trim();
if (!key) continue;
const raw = (m[2] ?? '').trim();
@@ -30,24 +49,121 @@ export function parseXmlToolCall(
return { name, args };
}
// v1.13.16: Anthropic-flavor parser. Same JSON-parse-with-string-fallback
// shape as parseXmlToolCall so the dispatch layer doesn't need to care which
// flavor produced the call.
const INVOKE_NAME_RE =
/<invoke\s+name\s*=\s*("([^"]*)"|'([^']*)')\s*>/;
const INVOKE_PARAM_RE =
/<parameter\s+name\s*=\s*("([^"]*)"|'([^']*)')\s*>([\s\S]*?)<\/parameter>/g;
export function parseInvokeToolCall(block: string): ParsedCall | null {
const nameMatch = block.match(INVOKE_NAME_RE);
if (!nameMatch) return null;
const name = (nameMatch[2] ?? nameMatch[3] ?? '').trim();
if (!name) return null;
const args: Record<string, unknown> = {};
for (const m of block.matchAll(INVOKE_PARAM_RE)) {
const key = ((m[2] ?? m[3] ?? '') as string).trim();
if (!key) continue;
const raw = (m[4] ?? '').trim();
try {
args[key] = JSON.parse(raw);
} catch {
args[key] = raw;
}
}
return { name, args };
}
// Locate the first character that begins (or completely contains) an
// unfinished <tool_call> opener in `s`. Returns -1 when `s` can be flushed
// to the client in full without risking a partial tag leak.
// Case 1: a full `<tool_call>` opener with no matching closer — caller
// must keep everything from that index forward until the next
// chunk arrives with the closer.
// Case 2: `s` ends with a strict prefix of `<tool_call>` (e.g. `<tool_c`).
// Caller must keep just that suffix in the buffer.
// unfinished opener (either flavor) in `s`. Returns -1 when `s` can be
// flushed to the client in full without risking a partial tag leak.
// Case 1: a full opener (`<tool_call>` or `<invoke`) with no matching
// closer — caller must keep everything from that index forward
// until the next chunk arrives with the closer.
// Case 2: `s` ends with a strict prefix of either opener (e.g. `<tool_c`
// or `<invo`). Caller must keep just that suffix in the buffer.
// Note: case 1 assumes the calling loop already extracted every complete
// <tool_call>…</tool_call> pair before reaching this check.
// block before reaching this check.
const ALL_OPENERS = [XML_TOOL_OPEN, INVOKE_TOOL_OPEN] as const;
export function partialXmlOpenerStart(s: string): number {
const fullOpener = s.indexOf(XML_TOOL_OPEN);
if (fullOpener !== -1) return fullOpener;
let earliest = -1;
for (const op of ALL_OPENERS) {
const idx = s.indexOf(op);
if (idx === -1) continue;
if (earliest === -1 || idx < earliest) earliest = idx;
}
if (earliest !== -1) return earliest;
const lastLt = s.lastIndexOf('<');
if (lastLt === -1) return -1;
const suffix = s.slice(lastLt);
if (XML_TOOL_OPEN.startsWith(suffix) && suffix.length < XML_TOOL_OPEN.length) {
return lastLt;
for (const op of ALL_OPENERS) {
if (op.startsWith(suffix) && suffix.length < op.length) return lastLt;
}
return -1;
}
// v1.13.16: unified extraction. Replaces the inline loop that used to live
// in stream-phase.ts. Pure function — returns the visible text to flush,
// the parsed tool-call payloads in source order, and the buffer remainder
// to retain for the next streaming chunk. Parse failures are silently
// dropped (matches the pre-v1.13.16 behavior — leaking partial XML to the
// chat looks worse than swallowing a bad block).
export interface ToolCallExtraction {
flushed: string;
calls: ParsedCall[];
remaining: string;
}
interface OpenerSpec {
open: string;
close: string;
parse: (block: string) => ParsedCall | null;
}
const OPENER_SPECS: ReadonlyArray<OpenerSpec> = [
{ open: XML_TOOL_OPEN, close: XML_TOOL_CLOSE, parse: parseXmlToolCall },
{ open: INVOKE_TOOL_OPEN, close: INVOKE_TOOL_CLOSE, parse: parseInvokeToolCall },
];
export function extractToolCallBlocks(buffer: string): ToolCallExtraction {
let flushed = '';
const calls: ParsedCall[] = [];
let pos = 0;
while (pos < buffer.length) {
let next: { spec: OpenerSpec; openIdx: number; closeIdx: number } | null = null;
for (const spec of OPENER_SPECS) {
const openIdx = buffer.indexOf(spec.open, pos);
if (openIdx === -1) continue;
const closeIdx = buffer.indexOf(spec.close, openIdx);
if (closeIdx === -1) continue;
if (next === null || openIdx < next.openIdx) {
next = { spec, openIdx, closeIdx };
}
}
if (next === null) break;
if (next.openIdx > pos) {
flushed += buffer.slice(pos, next.openIdx);
}
const blockEnd = next.closeIdx + next.spec.close.length;
const block = buffer.slice(next.openIdx, blockEnd);
const parsed = next.spec.parse(block);
if (parsed) calls.push(parsed);
pos = blockEnd;
}
const tail = buffer.slice(pos);
const partialIdx = partialXmlOpenerStart(tail);
if (partialIdx === -1) {
flushed += tail;
return { flushed, calls, remaining: '' };
}
if (partialIdx > 0) {
flushed += tail.slice(0, partialIdx);
}
return { flushed, calls, remaining: tail.slice(partialIdx) };
}

View File

@@ -16,9 +16,22 @@ export async function resolveProjectRoot(projectPath: string): Promise<string> {
}
}
function isUnder(real: string, root: string): boolean {
return real === root || real.startsWith(root + sep);
}
// v1.13.17-cross-repo-reads: pathGuard now accepts an optional extraRoots
// list (typically session.allowed_read_paths). The primary projectRoot is
// tried first; if the resolved path doesn't sit under it, each extraRoot is
// tried in turn. Throws PathScopeError if no root accepts. The error message
// includes a hint pointing the model at the request_read_access tool so it
// can self-correct on the next turn — extraRoots IS the persistence
// mechanism for those grants, so we only suggest it when there's a missing
// grant to ask for (i.e. the path isn't already under any allowed root).
export async function pathGuard(
projectRoot: string,
requested: string
requested: string,
extraRoots: readonly string[] = [],
): Promise<string> {
if (typeof requested !== 'string' || requested.length === 0) {
throw new PathScopeError('path is required');
@@ -30,10 +43,13 @@ export async function pathGuard(
} catch {
throw new PathScopeError(`path does not exist: ${requested}`);
}
if (real !== projectRoot && !real.startsWith(projectRoot + sep)) {
throw new PathScopeError(
`path escapes project root: ${requested} -> ${real}`
);
if (isUnder(real, projectRoot)) return real;
for (const extra of extraRoots) {
if (extra.length === 0) continue;
if (isUnder(real, extra)) return real;
}
return real;
throw new PathScopeError(
`path escapes project root: ${requested} -> ${real}. ` +
`Use request_read_access(path, reason) to ask the user for permission.`,
);
}

View File

@@ -0,0 +1,82 @@
// v1.13.17-cross-repo-reads: tool the model uses to request read access to
// a path outside its session's primary project root. When the model emits
// view_file("/opt/forks/foo/go.mod") under a session scoped to /opt/boocode,
// pathGuard's error message hints at this tool. The model then emits
// request_read_access(path="/opt/forks/foo/go.mod",
// reason="investigating foo to write the design doc")
// The tool's execute does cheap up-front validation: if the requested path
// can't possibly be granted under the current whitelist + repo-shape rules,
// it returns a denial immediately without prompting the user. Otherwise, the
// tool-phase pause branch (parallel of ask_user_input) stores a pending
// sentinel and waits for the user's allow/deny via the grant_read_access
// endpoint.
//
// The execute body never directly mutates state; the grant endpoint owns
// the persistence path. This keeps the tool-side logic side-effect-free
// (it's just a request) and matches ask_user_input's "server-side no-op
// fallback, pause happens in tool-phase" shape.
import { z } from 'zod';
import type { ToolDef } from './tools.js';
const RequestReadAccessInput = z.object({
path: z.string().min(1),
reason: z.string().min(1).max(500),
});
type RequestReadAccessInputT = z.infer<typeof RequestReadAccessInput>;
export const requestReadAccess: ToolDef<RequestReadAccessInputT> = {
name: 'request_read_access',
description:
"Ask the user for read-only access to a path outside the current " +
"session's project scope. Use when a previous read tool (view_file, " +
'list_dir, grep, find_files) was refused with a path-escapes-project ' +
'error and the path is plausibly under another known repository (e.g. ' +
'/opt/forks/foo). Provide a short reason describing why you need the ' +
"access. Pauses the conversation until the user picks Allow or Deny; " +
'the next assistant turn sees the result. On Allow, the tool result ' +
'is "granted: <root>" — subsequent reads under that root succeed for ' +
'the rest of the session. On Deny, the tool result is "denied". Do ' +
'not call this for paths that are already inside the project root.',
inputSchema: RequestReadAccessInput,
jsonSchema: {
type: 'function',
function: {
name: 'request_read_access',
description:
"Ask the user for read-only access to a path outside the session's " +
'project scope. Pauses the conversation until the user picks Allow ' +
'or Deny. Subsequent reads under the granted root succeed for the ' +
'rest of the session.',
parameters: {
type: 'object',
properties: {
path: {
type: 'string',
description:
'Absolute path the model wants to read. Must be under the ' +
"server's PROJECT_ROOT_WHITELIST (default /opt) and outside " +
"the session's primary project root.",
},
reason: {
type: 'string',
description:
'Short rationale (<=500 chars) shown to the user explaining ' +
'why the access is needed. The user uses this to decide.',
},
},
required: ['path', 'reason'],
additionalProperties: false,
},
},
},
// Server-side no-op. The "execution" of request_read_access is the
// pause-and-resume cycle managed by tool-phase.ts + the grant endpoint.
// The inference loop catches this tool name BEFORE executeToolCall fires
// and inserts a pending sentinel instead — this fallback only runs if
// something bypasses that branch, in which case we surface the pending
// shape so downstream code can still detect it. Mirrors ask_user_input.
async execute(input) {
return { _pending: true, path: input.path, reason: input.reason };
},
};

View File

@@ -0,0 +1,493 @@
// v1.13.13: forced second-inference synthesis pass for codecontext
// overview/analysis tools. Triggered from tool-phase.ts after a codecontext
// tool call lands and BEFORE the normal recursive runAssistantTurn fires.
//
// Inputs to the synthesis stream:
// 1. The codecontext tool's result text.
// 2. Top-N source files referenced in that text, fetched via view_file.
// 3. Project documentation auto-fetched from the repo root.
// 4. The original user message that triggered the turn.
//
// Output: a NEW assistant message whose sole part is kind='synthesis'.
// Streams to the client as deltas exactly like a normal assistant turn.
//
// Failure modes (all fall through to recursive runAssistantTurn):
// - SYNTHESIS_TOOLS membership check fails -> return false immediately.
// - File-fetch / doc-fetch errors -> silent skip, continue with what we have.
// - Stream error / timeout -> mark synth message status='failed', return false.
// - User-abort -> mark cancelled and re-throw so the outer abort handler runs.
import { promises as fs } from 'node:fs';
import { join } from 'node:path';
import { TOOLS_BY_NAME } from './tools.js';
import { streamCompletion } from './inference/stream-phase.js';
import { SYNTHESIS_SYSTEM_PROMPT } from './synthesisPrompt.js';
import { insertParts } from './inference/parts.js';
import * as modelContext from './model-context.js';
import { readTruncation } from './truncate.js';
import type { Session } from '../types/api.js';
import type { OpenAiMessage } from './inference/payload.js';
import type { InferenceContext, TurnArgs } from './inference/turn.js';
export const SYNTHESIS_TOOLS: ReadonlySet<string> = new Set([
'get_codebase_overview',
'get_framework_analysis',
'get_semantic_neighborhoods',
]);
const TOP_N_FILES = 5;
const FILE_LINE_CAP = 200;
const DOC_LINE_CAP = 500;
// Token budget for the auto-fetched content (files + docs combined). Estimated
// via chars/4 — a rough but stable proxy that doesn't require a tokenizer dep.
const TOKEN_BUDGET = 32_000;
const CHARS_PER_TOKEN = 4;
// 90s per synthesis call. Long enough for a thoughtful overview against a
// large auto-fetched payload; short enough that a hung upstream falls through
// to the normal recursive turn within a typical user attention window.
const SYNTH_TIMEOUT_MS = 90_000;
// File-extension regex for referenced-file extraction. Limited to source-
// language extensions so we don't pull in lockfiles, images, etc.
const FILE_PATH_RE =
/(?:^|[`'"<\s\(\[])([A-Za-z0-9_./@-]+\.(?:ts|tsx|js|jsx|py|go|rs|java|kt|c|cpp|h|hpp|md|json|yaml|yml|sql|sh|html|css))(?=[`'"<\)\]\s,;:]|$)/gm;
export interface SynthesisParams {
ctx: InferenceContext;
args: TurnArgs;
session: Session;
projectRoot: string;
toolName: string;
toolResultText: string;
// v1.13.15-b: when codecontext's wrapper hit its 32k inline-truncation
// limit, we expand the full content via readTruncation for reference-file
// extraction only. toolResultText (the truncated head) still ships to the
// synth model — preserves the 32k payload-budget contract.
truncated?: boolean;
// opaque id (tr_<…>), not a filesystem path — see truncate.ts naming note
outputPath?: string;
}
interface FetchedFile {
path: string;
content: string;
}
interface DocsCollection {
boochat?: string;
agents?: string;
context?: string;
roadmap?: string;
}
export async function runSynthesisPass(p: SynthesisParams): Promise<boolean> {
if (!SYNTHESIS_TOOLS.has(p.toolName)) return false;
let synthMessageId: string | null = null;
let accumulated = '';
let timedOut = false;
const synthCtrl = new AbortController();
const timer = setTimeout(() => {
timedOut = true;
synthCtrl.abort();
}, SYNTH_TIMEOUT_MS);
try {
const userMessage = await fetchOriginalUserMessage(p.ctx, p.args.chatId);
if (!userMessage) {
p.ctx.log.warn({ chatId: p.args.chatId }, 'synthesis: no user message found; falling through');
return false;
}
// v1.13.15-b: when the tool result was inline-truncated by the wrapper
// (32k cap, see codecontext_client.ts:114), expand the full content from
// tmpfs for reference-file extraction. The synth payload still ships the
// truncated head (see buildPayload call below) so the token-budget
// contract holds. Graceful degradation: if readTruncation returns null
// (missing id, ENOENT) or throws, fall back to the truncated head.
let extractionSource = p.toolResultText;
if (p.truncated && p.outputPath) {
try {
const full = await readTruncation(p.outputPath);
if (full !== null) {
extractionSource = full;
p.ctx.log.info(
{
chatId: p.args.chatId,
toolName: p.toolName,
originalChars: p.toolResultText.length,
fullChars: full.length,
},
'synthesis: expanded truncated tool output',
);
}
} catch (err) {
p.ctx.log.warn(
{ chatId: p.args.chatId, toolName: p.toolName, err: String(err) },
'synthesis: readTruncation failed, using truncated output',
);
}
}
const refFiles = extractReferencedFiles(extractionSource);
const files = await fetchTopFiles(refFiles, p.projectRoot);
const docs = await fetchProjectDocs(p.projectRoot);
const { files: budgetedFiles, docs: budgetedDocs } = applyTokenBudget(files, docs);
const synthMessages = buildPayload(
p.toolName,
// Truncated head only — full content was used for reference extraction above
p.toolResultText,
budgetedFiles,
budgetedDocs,
userMessage,
);
// Insert + announce the synthesis assistant message. From here on, any
// exception must clean up via the catch block so the row doesn't linger
// in 'streaming' status (the 5min stale-streaming sweeper catches it
// eventually, but explicit cleanup is better).
const [synthRow] = await p.ctx.sql<
{ id: string; started_at: string }[]
>`
INSERT INTO messages (session_id, chat_id, role, content, status, started_at, created_at)
VALUES (${p.args.sessionId}, ${p.args.chatId}, 'assistant', '', 'streaming', clock_timestamp(), clock_timestamp())
RETURNING id, started_at
`;
synthMessageId = synthRow!.id;
const startedAt = synthRow!.started_at;
p.ctx.publish(p.args.sessionId, {
type: 'message_started',
message_id: synthMessageId,
chat_id: p.args.chatId,
role: 'assistant',
});
// Combine the user-abort signal with our synthesis-specific timeout so
// either fires correctly. The `timedOut` flag in scope tells us which one
// tripped after streamCompletion throws.
const combinedSignal: AbortSignal | undefined = p.args.signal
? AbortSignal.any([p.args.signal, synthCtrl.signal])
: synthCtrl.signal;
const onDelta = (delta: string): void => {
accumulated += delta;
p.ctx.publish(p.args.sessionId, {
type: 'delta',
message_id: synthMessageId!,
chat_id: p.args.chatId,
content: delta,
});
};
const streamResult = await streamCompletion(
p.ctx,
p.session.model,
synthMessages,
{ tools: null },
onDelta,
undefined,
combinedSignal,
);
const mctx = await modelContext.getModelContext(p.session.model);
const nCtx = mctx?.n_ctx ?? null;
const [updated] = await p.ctx.sql<
{
tokens_used: number | null;
ctx_used: number | null;
ctx_max: number | null;
finished_at: string | null;
}[]
>`
UPDATE messages
SET content = ${streamResult.content},
status = 'complete',
tokens_used = ${streamResult.completionTokens},
ctx_used = ${streamResult.promptTokens},
ctx_max = ${nCtx},
finished_at = clock_timestamp()
WHERE id = ${synthMessageId}
RETURNING tokens_used, ctx_used, ctx_max, finished_at
`;
await insertParts(p.ctx.sql, [
{
message_id: synthMessageId,
sequence: 0,
kind: 'synthesis',
payload: { text: streamResult.content },
},
]);
p.ctx.publish(p.args.sessionId, {
type: 'message_complete',
message_id: synthMessageId,
chat_id: p.args.chatId,
tokens_used: updated?.tokens_used ?? null,
ctx_used: updated?.ctx_used ?? null,
ctx_max: updated?.ctx_max ?? null,
started_at: startedAt,
finished_at: updated?.finished_at ?? null,
model: p.session.model,
});
p.ctx.publishUser({
type: 'chat_status',
chat_id: p.args.chatId,
status: 'idle',
at: new Date().toISOString(),
});
p.ctx.log.info(
{
chatId: p.args.chatId,
synthMessageId,
toolName: p.toolName,
chars: streamResult.content.length,
files: budgetedFiles.length,
},
'synthesis pass complete',
);
return true;
} catch (err) {
await markSynthFailed(p, synthMessageId, accumulated).catch((cleanupErr) => {
p.ctx.log.warn({ cleanupErr: String(cleanupErr) }, 'synthesis cleanup UPDATE failed');
});
if (err instanceof Error && err.name === 'AbortError') {
if (timedOut) {
p.ctx.log.warn(
{ toolName: p.toolName, chatId: p.args.chatId },
'synthesis pass timed out; falling through to recursive turn',
);
return false;
}
// User-initiated abort: propagate so the outer error handler marks the
// parent turn cancelled. The synth message is already marked failed by
// markSynthFailed above.
throw err;
}
p.ctx.log.warn(
{ err: String(err), toolName: p.toolName, chatId: p.args.chatId },
'synthesis pass failed; falling through to recursive turn',
);
return false;
} finally {
clearTimeout(timer);
}
}
async function markSynthFailed(
p: SynthesisParams,
synthMessageId: string | null,
accumulated: string,
): Promise<void> {
if (synthMessageId === null) return;
await p.ctx.sql`
UPDATE messages
SET content = ${accumulated},
status = 'failed',
finished_at = clock_timestamp()
WHERE id = ${synthMessageId}
`;
// Republish so the frontend's live state flips from 'streaming' to
// terminal. message_complete carries no error reason — the row's status
// column is the truth. The 5-state chat_status dot has 'error' but we
// don't fire that here because the broader inference is about to retry
// via recursion; flipping the user-channel status to 'error' would race
// the recursive turn's 'streaming' announcement.
p.ctx.publish(p.args.sessionId, {
type: 'message_complete',
message_id: synthMessageId,
chat_id: p.args.chatId,
model: p.session.model,
});
}
async function fetchOriginalUserMessage(
ctx: InferenceContext,
chatId: string,
): Promise<string | null> {
const rows = await ctx.sql<{ content: string }[]>`
SELECT content FROM messages
WHERE chat_id = ${chatId} AND role = 'user'
ORDER BY created_at DESC
LIMIT 1
`;
return rows[0]?.content ?? null;
}
function extractReferencedFiles(text: string): string[] {
const seen = new Set<string>();
const order: string[] = [];
let m: RegExpExecArray | null;
while ((m = FILE_PATH_RE.exec(text)) !== null) {
const candidate = m[1]!;
if (seen.has(candidate)) continue;
if (
candidate.includes('node_modules') ||
candidate.includes('/dist/') ||
candidate.includes('/test/') ||
candidate.includes('/tests/') ||
/\.(test|spec)\.[a-z]+$/.test(candidate)
) {
continue;
}
seen.add(candidate);
order.push(candidate);
}
return order;
}
async function fetchTopFiles(refs: string[], projectRoot: string): Promise<FetchedFile[]> {
const tool = TOOLS_BY_NAME['view_file'];
if (!tool) return [];
const out: FetchedFile[] = [];
for (const p of refs.slice(0, TOP_N_FILES)) {
const absPath = p.startsWith('/') ? p : join(projectRoot, p);
try {
const r = await tool.execute({ path: absPath, end_line: FILE_LINE_CAP }, projectRoot);
const content = (r as { content?: string }).content ?? '';
if (content) out.push({ path: p, content });
} catch {
// path-scope blocked, secret-filtered, file too large, or missing —
// skip silently. The remaining files (or none) still produce a
// meaningful synthesis input.
}
}
return out;
}
async function fetchProjectDocs(projectRoot: string): Promise<DocsCollection> {
const tool = TOOLS_BY_NAME['view_file'];
if (!tool) return {};
const docs: DocsCollection = {};
for (const [filename, key] of [
['BOOCHAT.md', 'boochat'],
['AGENTS.md', 'agents'],
['CONTEXT.md', 'context'],
] as const) {
try {
const r = await tool.execute(
{ path: join(projectRoot, filename), end_line: DOC_LINE_CAP },
projectRoot,
);
const content = (r as { content?: string }).content;
if (content) docs[key] = content;
} catch {
// missing doc — skip
}
}
// Case-insensitive *roadmap*.md glob. Picks the first match (alphabetical
// by readdir() order); typical projects have at most one roadmap doc.
try {
const entries = await fs.readdir(projectRoot);
const roadmap = entries.find(
(e) => /roadmap/i.test(e) && e.toLowerCase().endsWith('.md'),
);
if (roadmap) {
const r = await tool.execute(
{ path: join(projectRoot, roadmap), end_line: DOC_LINE_CAP },
projectRoot,
);
const content = (r as { content?: string }).content;
if (content) docs.roadmap = content;
}
} catch {
// unreadable project root — skip
}
return docs;
}
function estTokens(s: string | undefined): number {
return s ? Math.ceil(s.length / CHARS_PER_TOKEN) : 0;
}
function applyTokenBudget(
files: FetchedFile[],
docs: DocsCollection,
): { files: FetchedFile[]; docs: DocsCollection } {
let total = 0;
for (const f of files) total += estTokens(f.content);
total += estTokens(docs.boochat) + estTokens(docs.agents) + estTokens(docs.context) + estTokens(docs.roadmap);
if (total <= TOKEN_BUDGET) return { files, docs };
// Drop priority (lowest priority dropped first):
// 1. top-2..N files (keep top-1)
// 2. top-1 file
// 3. roadmap (+ CONTEXT.md grouped here — dispatch listed roadmap above
// AGENTS.md, CONTEXT.md was not in the priority list)
// 4. AGENTS.md
// 5. BOOCHAT.md (never dropped — truncate to budget if alone exceeds)
let outFiles = files.slice();
const outDocs: DocsCollection = { ...docs };
while (total > TOKEN_BUDGET && outFiles.length > 1) {
const last = outFiles.pop()!;
total -= estTokens(last.content);
}
if (total <= TOKEN_BUDGET) return { files: outFiles, docs: outDocs };
if (outFiles[0]) {
total -= estTokens(outFiles[0].content);
outFiles = [];
}
if (total <= TOKEN_BUDGET) return { files: outFiles, docs: outDocs };
if (outDocs.roadmap) {
total -= estTokens(outDocs.roadmap);
delete outDocs.roadmap;
}
if (outDocs.context) {
total -= estTokens(outDocs.context);
delete outDocs.context;
}
if (total <= TOKEN_BUDGET) return { files: outFiles, docs: outDocs };
if (outDocs.agents) {
total -= estTokens(outDocs.agents);
delete outDocs.agents;
}
if (total <= TOKEN_BUDGET) return { files: outFiles, docs: outDocs };
if (outDocs.boochat) {
const maxChars = TOKEN_BUDGET * CHARS_PER_TOKEN;
if (outDocs.boochat.length > maxChars) {
outDocs.boochat = outDocs.boochat.slice(0, maxChars);
}
}
return { files: outFiles, docs: outDocs };
}
function buildPayload(
toolName: string,
toolResultText: string,
files: FetchedFile[],
docs: DocsCollection,
userMessage: string,
): OpenAiMessage[] {
const sections: string[] = [];
sections.push(`## Codecontext tool output (${toolName})\n\n${toolResultText}`);
if (files.length > 0) {
sections.push(`---\n\n## Auto-fetched source files`);
for (const f of files) {
sections.push(`### ${f.path}\n\n\`\`\`\n${f.content}\n\`\`\``);
}
}
const docEntries: Array<[string, string | undefined]> = [
['BOOCHAT.md', docs.boochat],
['AGENTS.md', docs.agents],
['CONTEXT.md', docs.context],
['roadmap', docs.roadmap],
];
const presentDocs = docEntries.filter(([, v]) => Boolean(v));
if (presentDocs.length > 0) {
sections.push(`---\n\n## Project documentation`);
for (const [name, v] of presentDocs) {
sections.push(`### ${name}\n\n${v!}`);
}
}
sections.push(`---\n\n## Original user question\n\n${userMessage}`);
return [
{ role: 'system', content: SYNTHESIS_SYSTEM_PROMPT },
{ role: 'user', content: sections.join('\n\n') },
];
}

View File

@@ -0,0 +1,20 @@
// v1.13.13: synthesis pipeline system prompt. Verbatim from the v1.13.13
// dispatch — do not paraphrase. The synthesis pass loads this as its sole
// system message, followed by a user message that concatenates the
// codecontext tool result, auto-fetched top files, auto-fetched project
// docs, and the original user message.
export const SYNTHESIS_SYSTEM_PROMPT = `You are synthesizing structural data into an accurate, detailed answer about the user's codebase.
Inputs you have been given:
1. The output of a codecontext analysis tool (raw structural data — file counts, symbols, dependencies, frameworks).
2. The contents of the top files referenced in that output.
3. Any project documentation found in the repo root (BOOCHAT.md, AGENTS.md, roadmap docs, CONTEXT.md).
Rules:
- Cite specific files and line numbers when making claims about code.
- If project docs contradict the code, docs win for questions about state, version, status, or roadmap. Code wins for questions about runtime behavior or implementation.
- If the codecontext output looks sparse (low symbol count for a TypeScript project, missing dependency edges, empty framework list), explicitly say so — codecontext falls back to the JavaScript grammar for TypeScript and loses interfaces, generics, decorators, and type aliases.
- Do not invent symbols, files, or relationships that are not present in the inputs.
- Do not respond with a generic "this looks like a [framework] project" summary. The user has the framework analysis already. Add specifics: what is actually in this codebase, what is shipped, what is planned, what is load-bearing.
- Length: match the depth the user asked for. Overview questions get structured multi-section answers. Specific questions get focused answers.
`;

View File

@@ -8,9 +8,19 @@
// + container guidance (this layer, NEW in v1.12)
// + agent.system_prompt (resolved from data/AGENTS.md by getAgentById)
// + session.system_prompt OR project.default_system_prompt
//
// v1.13.8: byte-stability instrumentation. buildSystemPromptWithFingerprint
// returns the assembled string plus a SHA-256 fingerprint and a per-session
// drift signal. buildSystemPrompt stays a string→string shim for backward
// compat (tests use it). No cache added — recon proved input-layer mtime
// caches (this file + agents.ts) already deliver byte-stable inputs in
// steady state. v1.13.8 measures that claim against production traffic
// before any cache infrastructure earns its place.
import { createHash } from 'node:crypto';
import { readFile, stat } from 'node:fs/promises';
import type { Agent, Project, Session } from '../types/api.js';
import { getAgentsMtimes } from './agents.js';
const BASE_SYSTEM_PROMPT = (projectPath: string) =>
`You are BooCode Chat, a code investigation assistant. The user is working on a project located at ${projectPath}. Use the file-read tools (view_file, list_dir, grep, find_files) to investigate code when needed. Be concise. Cite file paths and line numbers when discussing code. Do not hallucinate file contents — read the file first. Tool results may be truncated; if so, narrow your query rather than guessing.`;
@@ -60,11 +70,94 @@ export function _resetContainerGuidanceCacheForTests(): void {
cachedGuidance = null;
}
export async function buildSystemPrompt(
// v1.13.8: expose the mtime currently held in the BOOCHAT cache so the
// fingerprint log can stamp it without re-statting (no I/O race against
// getContainerGuidance, which is the canonical mtime source).
function getCachedGuidanceMtime(): number | null {
if (!cachedGuidance) return null;
// mtime=0 is the sentinel for "file is missing" (set in the catch above).
// Surface it as null so the log/diff doesn't treat absence as a number.
return cachedGuidance.mtime > 0 ? cachedGuidance.mtime : null;
}
// v1.13.8: fingerprint emitted per turn, observer state keyed by session.
// Field set is intentionally small — we want the diff between two
// fingerprints to point at the exact input that drifted, not bury the
// signal in noise.
export interface PrefixFingerprint {
msg: 'prefix-fingerprint';
project_id: string;
agent_id: string | null;
agent_name: string | null;
session_id: string;
prefix_hash: string;
prefix_length: number;
mtime_boochat: number | null;
mtime_agents_global: number | null;
mtime_agents_project: number | null;
has_agent_system_prompt: boolean;
has_session_override: boolean;
has_project_override: boolean;
}
export interface PrefixDrift {
msg: 'prefix-drift';
session_id: string;
prev_hash: string;
new_hash: string;
prev_length: number;
new_length: number;
// Names of fields in PrefixFingerprint (excluding the hash + length pair
// and the session_id key itself) whose values differ between the previous
// observation and this one. The bug case is `changed_inputs: []` — hash
// differs but no tracked input moved, which means assembly is
// nondeterministic somewhere.
changed_inputs: string[];
}
// Fields tracked per-session for the drift diff. Stored alongside the hash
// so we can recompute changed_inputs without re-running buildSystemPrompt.
interface ObservedInputs {
agent_id: string | null;
mtime_boochat: number | null;
mtime_agents_global: number | null;
mtime_agents_project: number | null;
has_agent_system_prompt: boolean;
has_session_override: boolean;
has_project_override: boolean;
}
interface ObserverEntry {
hash: string;
length: number;
inputs: ObservedInputs;
}
// Unbounded by design for v1.13.8 (instrumentation, short-lived sessions in
// the smoke test). TODO(v1.13.x follow-up if v1.13.8 surfaces stable):
// LRU-bound this Map at 1000 sessions when the in-process surface lives long
// enough to matter.
const prefixObserver = new Map<string, ObserverEntry>();
// Test-only: clear the observer so consecutive tests don't share state.
export function _resetPrefixObserverForTests(): void {
prefixObserver.clear();
}
function computeChangedInputs(prev: ObservedInputs, curr: ObservedInputs): string[] {
const out: string[] = [];
const keys = Object.keys(curr) as (keyof ObservedInputs)[];
for (const k of keys) {
if (prev[k] !== curr[k]) out.push(k);
}
return out;
}
export async function buildSystemPromptWithFingerprint(
project: Project,
session: Session,
agent: Agent | null
): Promise<string> {
agent: Agent | null,
): Promise<{ prompt: string; fingerprint: PrefixFingerprint; drift: PrefixDrift | null }> {
let out = BASE_SYSTEM_PROMPT(project.path);
const guidance = await getContainerGuidance();
if (guidance) {
@@ -79,5 +172,60 @@ export async function buildSystemPrompt(
if (userPrompt.length > 0) {
out += '\n\n' + userPrompt;
}
return out;
const hash = createHash('sha256').update(out, 'utf8').digest('hex');
const agentsMtimes = getAgentsMtimes(project.path);
const inputs: ObservedInputs = {
agent_id: agent?.id ?? null,
mtime_boochat: getCachedGuidanceMtime(),
mtime_agents_global: agentsMtimes.global,
mtime_agents_project: agentsMtimes.project,
has_agent_system_prompt: !!(agent && agent.system_prompt.trim().length > 0),
has_session_override: sessionPrompt.length > 0,
has_project_override: projectPrompt.length > 0,
};
const fingerprint: PrefixFingerprint = {
msg: 'prefix-fingerprint',
project_id: project.id,
agent_id: agent?.id ?? null,
agent_name: agent?.name ?? null,
session_id: session.id,
prefix_hash: hash,
prefix_length: out.length,
mtime_boochat: inputs.mtime_boochat,
mtime_agents_global: inputs.mtime_agents_global,
mtime_agents_project: inputs.mtime_agents_project,
has_agent_system_prompt: inputs.has_agent_system_prompt,
has_session_override: inputs.has_session_override,
has_project_override: inputs.has_project_override,
};
let drift: PrefixDrift | null = null;
const prev = prefixObserver.get(session.id);
if (prev && prev.hash !== hash) {
drift = {
msg: 'prefix-drift',
session_id: session.id,
prev_hash: prev.hash,
new_hash: hash,
prev_length: prev.length,
new_length: out.length,
changed_inputs: computeChangedInputs(prev.inputs, inputs),
};
}
prefixObserver.set(session.id, { hash, length: out.length, inputs });
return { prompt: out, fingerprint, drift };
}
// Backward-compatible string-returning shim. Kept so existing callers
// (tests, future code paths that don't want to log) work unchanged.
export async function buildSystemPrompt(
project: Project,
session: Session,
agent: Agent | null,
): Promise<string> {
const { prompt } = await buildSystemPromptWithFingerprint(project, session, agent);
return prompt;
}

View File

@@ -8,6 +8,7 @@ import { getGitMeta } from './git_meta.js';
import { findSkills, getSkillBody, getSkillResource } from './skills.js';
import { webSearch } from './web_search.js';
import { webFetch } from './web_fetch.js';
import { readTruncation, truncateIfNeeded } from './truncate.js';
// v1.12 Track B.2: codecontext tools. 8 wrappers re-exported from
// tools/codecontext/index.ts. Each calls into services/codecontext_client.ts
// which talks to the codecontext sidecar at http://codecontext:8080.
@@ -21,6 +22,10 @@ import {
getSemanticNeighborhoods,
getFrameworkAnalysis,
} from './tools/codecontext/index.js';
// v1.13.17-cross-repo-reads: cross-repo read grant request tool. Paired
// with the pause-on-pending-grant branch in inference/tool-phase.ts and the
// POST /api/chats/:id/grant_read_access endpoint in routes/messages.ts.
import { requestReadAccess } from './request_read_access.js';
const MAX_FILE_BYTES = 5 * 1024 * 1024;
const DEFAULT_VIEW_LINES = 200;
@@ -44,7 +49,13 @@ export interface ToolDef<TInput> {
description: string;
inputSchema: z.ZodType<TInput>;
jsonSchema: ToolJsonSchema;
execute(input: TInput, projectRoot: string): Promise<unknown>;
// v1.13.17-cross-repo-reads: extraRoots is the session's
// allowed_read_paths, threaded through executeToolCall in tool-phase.ts.
// Only the filesystem tools (view_file, list_dir, grep, find_files,
// view_truncated_output) forward it to pathGuard; other tools accept the
// arg and ignore it. The execute signature stays compatible with
// pre-v1.13.17 callsites because the parameter is optional.
execute(input: TInput, projectRoot: string, extraRoots?: readonly string[]): Promise<unknown>;
}
const ViewFileInput = z.object({
@@ -77,14 +88,19 @@ export const viewFile: ToolDef<ViewFileInputT> = {
},
},
},
async execute(input, projectRoot) {
const real = await pathGuard(projectRoot, input.path);
async execute(input, projectRoot, extraRoots) {
const real = await pathGuard(projectRoot, input.path, extraRoots);
// v1.11.7: secret-file deny check. Test the project-relative path
// (matches the form continue.dev's patterns expect: basenames + dir
// segments). Throw a typed error so executeToolCall in inference.ts
// surfaces a clear "blocked" message to the LLM instead of silently
// returning content the user wanted hidden.
const relPath = relative(projectRoot, real) || basename(real);
// v1.13.17: when the resolved path is outside the primary projectRoot
// (i.e. via an allowed_read_paths grant), `relative()` returns "../…"
// which won't match secret-file basename patterns. Re-anchor on the
// file's basename so the secret deny still fires across all grant roots.
const rel = relative(projectRoot, real);
const relPath = rel && !rel.startsWith('..') ? rel : basename(real);
if (isSecretPath(relPath)) {
throw new SecretBlockedError(relPath);
}
@@ -109,12 +125,22 @@ export const viewFile: ToolDef<ViewFileInputT> = {
const slice = lines.slice(start - 1, end);
const content = slice.join('\n');
const truncated = total > end || start > 1;
// v1.13.5: stash the full file on tmpfs so the model can retrieve more
// via view_truncated_output(id) without re-reading the file (which it
// may not have project-relative-path access to in future agent setups).
// raw is bounded by MAX_FILE_BYTES (5MB), within truncateIfNeeded's cap.
const wrapped = await truncateIfNeeded({
fullContent: raw,
slicedContent: content,
wasTruncated: truncated,
});
return {
path: relative(projectRoot, real) || basename(real),
content,
content: wrapped.content,
total_lines: total,
returned_lines: [start, end],
truncated,
truncated: wrapped.truncated,
...(wrapped.outputPath ? { outputPath: wrapped.outputPath } : {}),
};
},
};
@@ -146,8 +172,8 @@ export const listDir: ToolDef<ListDirInputT> = {
},
},
},
async execute(input, projectRoot) {
const real = await pathGuard(projectRoot, input.path);
async execute(input, projectRoot, extraRoots) {
const real = await pathGuard(projectRoot, input.path, extraRoots);
const s = await stat(real);
if (!s.isDirectory()) {
throw new PathScopeError(`not a directory: ${input.path}`);
@@ -157,41 +183,64 @@ export const listDir: ToolDef<ListDirInputT> = {
? entries
: entries.filter((e) => !e.name.startsWith('.'));
const total = filtered.length;
const slice = filtered.slice(0, MAX_DIR_ENTRIES);
const out = await Promise.all(
slice.map(async (e) => {
const child = resolve(real, e.name);
let size: number | undefined;
if (e.isFile()) {
try {
const cs = await stat(child);
size = cs.size;
} catch {
/* ignore */
}
}
return {
name: e.name,
type: e.isDirectory() ? ('dir' as const) : ('file' as const),
...(size != null ? { size } : {}),
};
})
);
// v1.11.7: filter entries whose project-relative path matches a secret
// pattern. Each entry is tested using the project-rel dir + its name
// so the pattern's path/segment semantics work for nested dirs like
// `.aws/`. The count is surfaced via `pathguard_note` — we never list
// the hidden paths (defeats the purpose).
const wasTruncated = total > MAX_DIR_ENTRIES;
const relDir = relative(projectRoot, real) || '.';
// v1.13.5: when we'd truncate, render the FULL list to tmpfs so
// view_truncated_output can serve it. Stat sizes for all entries when
// truncating so the stored view matches the visible shape; this is the
// one extra cost for big directories, bounded by total entries (which
// is itself bounded by filesystem behavior).
const processOne = async (e: typeof filtered[number]) => {
const child = resolve(real, e.name);
let size: number | undefined;
if (e.isFile()) {
try {
const cs = await stat(child);
size = cs.size;
} catch { /* ignore */ }
}
return {
name: e.name,
type: e.isDirectory() ? ('dir' as const) : ('file' as const),
...(size != null ? { size } : {}),
};
};
const slice = filtered.slice(0, MAX_DIR_ENTRIES);
const out = await Promise.all(slice.map(processOne));
// v1.11.7: filter entries whose project-relative path matches a secret
// pattern. The same filter applies to the full-list snapshot below so
// the stashed file never holds entries the slice would have hidden.
const secretFilter = filterSecretEntries(out, (e) =>
relDir === '.' ? e.name : `${relDir}/${e.name}`,
);
let outputPath: string | undefined;
if (wasTruncated) {
const fullProcessed = await Promise.all(filtered.map(processOne));
const fullFiltered = filterSecretEntries(fullProcessed, (e) =>
relDir === '.' ? e.name : `${relDir}/${e.name}`,
);
// One line per entry, view_truncated_output's line slicing semantics
// map cleanly. Format: "<type>\t<name>[\tsize=N]". Header documents
// the shape so the model can grep / regex without prior schema lookup.
const header = `# list_dir ${relDir}${fullFiltered.kept.length} entries`;
const lines = [header, ...fullFiltered.kept.map((e) => {
const sz = 'size' in e && e.size != null ? `\tsize=${e.size}` : '';
return `${e.type}\t${e.name}${sz}`;
})];
const wrapped = await truncateIfNeeded({
fullContent: lines.join('\n'),
slicedContent: '',
wasTruncated: true,
});
outputPath = wrapped.outputPath;
}
return {
path: relDir,
entries: secretFilter.kept,
total: secretFilter.kept.length,
truncated: total > MAX_DIR_ENTRIES,
truncated: wasTruncated,
...(secretFilter.note ? { pathguard_note: secretFilter.note } : {}),
...(outputPath ? { outputPath } : {}),
};
},
};
@@ -230,7 +279,7 @@ export const grep: ToolDef<GrepInputT> = {
},
},
},
async execute(input, projectRoot) {
async execute(input, projectRoot, extraRoots) {
const limit = Math.min(
Math.max(input.max_results ?? DEFAULT_GREP_RESULTS, 1),
MAX_GREP_RESULTS
@@ -242,6 +291,7 @@ export const grep: ToolDef<GrepInputT> = {
max_matches: limit,
case_sensitive: input.case_sensitive,
hidden: input.hidden,
extra_roots: extraRoots,
});
const reshaped = result.matches.map((m) => ({
path: m.path,
@@ -291,7 +341,7 @@ export const findFiles: ToolDef<FindFilesInputT> = {
},
},
},
async execute(input, projectRoot) {
async execute(input, projectRoot, extraRoots) {
const limit = Math.min(
Math.max(input.max_results ?? DEFAULT_FIND_RESULTS, 1),
MAX_FIND_RESULTS
@@ -301,6 +351,7 @@ export const findFiles: ToolDef<FindFilesInputT> = {
const result = await fileOpsFindFiles(projectRoot, input.pattern, {
path: input.path,
max_results: limit,
extra_roots: extraRoots,
});
// v1.11.7: drop paths matching secret patterns. The original `total`
// from file_ops includes pre-truncation count; we report the visible
@@ -315,6 +366,74 @@ export const findFiles: ToolDef<FindFilesInputT> = {
},
};
// v1.13.5: retrieves the full content of a previously-truncated tool output
// via the opaque id stamped on the original tool_result. Line-based slicing
// matches view_file's mental model so the model uses the same affordances.
// Tmpfs-backed, 7-day TTL (see services/truncate.ts).
const VIEW_TRUNCATED_DEFAULT_LINES = 200;
const ViewTruncatedOutputInput = z.object({
id: z.string().regex(/^tr_[0-9a-v]{12}$/),
start_line: z.number().int().positive().optional(),
end_line: z.number().int().positive().optional(),
});
type ViewTruncatedOutputInputT = z.infer<typeof ViewTruncatedOutputInput>;
export const viewTruncatedOutput: ToolDef<ViewTruncatedOutputInputT> = {
name: 'view_truncated_output',
description: `Retrieve the full content of a previously-truncated tool output by its outputPath id. When a tool returns { truncated: true, outputPath: "tr_..." }, call this to view the full content. Defaults to the first ${VIEW_TRUNCATED_DEFAULT_LINES} lines. Use start_line and end_line (1-indexed, inclusive) to slice. Stored for 7 days.`,
inputSchema: ViewTruncatedOutputInput,
jsonSchema: {
type: 'function',
function: {
name: 'view_truncated_output',
description: `Retrieve the full content of a previously-truncated tool output by its outputPath id. Returns the first ${VIEW_TRUNCATED_DEFAULT_LINES} lines by default; use start_line/end_line to slice. Stored for 7 days.`,
parameters: {
type: 'object',
properties: {
id: { type: 'string', description: 'The outputPath value from an earlier truncated tool result (e.g. "tr_abc123def456").' },
start_line: { type: 'integer', description: 'First line (1-indexed). Default 1.' },
end_line: { type: 'integer', description: `Last line (1-indexed, inclusive). Default ${VIEW_TRUNCATED_DEFAULT_LINES} lines past start.` },
},
required: ['id'],
additionalProperties: false,
},
},
},
// view_truncated_output doesn't touch the filesystem — it pulls from tmpfs
// by opaque id. extraRoots is irrelevant here; declared for signature parity
// with the v1.13.17 ToolDef contract.
async execute(input, _projectRoot, _extraRoots) {
const content = await readTruncation(input.id);
if (content === null) {
return {
id: input.id,
content: '',
truncated: false,
error: `No truncation found for id "${input.id}". It may have been pruned (7-day TTL) or never existed.`,
};
}
const lines = content.split('\n');
const total = lines.length;
let start = input.start_line ?? 1;
let end = input.end_line ?? Math.min(total, start + VIEW_TRUNCATED_DEFAULT_LINES - 1);
if (start < 1) start = 1;
if (end > total) end = total;
if (end < start) end = start;
const slice = lines.slice(start - 1, end).join('\n');
// Re-slicing this view isn't truncation in the dual-write sense — the
// model already has the id; no point stashing the slice again.
const truncated = total > end || start > 1;
return {
id: input.id,
content: slice,
total_lines: total,
returned_lines: [start, end],
truncated,
};
},
};
// v1.8 Level 1 branch awareness: gives the model a read-only view of the
// project's git state. No path input — operates on the inference-resolved
// project root via getGitMeta. Subprocess runs with a 2s timeout (see git_meta).
@@ -534,6 +653,7 @@ export const askUserInput: ToolDef<AskUserInputInputT> = {
// and TOOLS_BY_NAME inherit it.
export const ALL_TOOLS: ReadonlyArray<ToolDef<unknown>> = [
viewFile as ToolDef<unknown>,
viewTruncatedOutput as ToolDef<unknown>,
listDir as ToolDef<unknown>,
grep as ToolDef<unknown>,
findFiles as ToolDef<unknown>,
@@ -558,6 +678,11 @@ export const ALL_TOOLS: ReadonlyArray<ToolDef<unknown>> = [
watchChanges as ToolDef<unknown>,
getSemanticNeighborhoods as ToolDef<unknown>,
getFrameworkAnalysis as ToolDef<unknown>,
// v1.13.17-cross-repo-reads: paired with the pause-on-pending-grant
// branch in tool-phase.ts. Read-only — only ever READS files; the only
// state change is appending to sessions.allowed_read_paths via the
// grant endpoint, gated by user consent.
requestReadAccess as ToolDef<unknown>,
].sort((a, b) => a.name.localeCompare(b.name));
// v1.8.2: forward-compatible read-only whitelist. An agent whose `tools` is
@@ -570,6 +695,7 @@ export const ALL_TOOLS: ReadonlyArray<ToolDef<unknown>> = [
// project state, so it belongs in the read-only set for budget purposes.
export const READ_ONLY_TOOL_NAMES = [
'view_file',
'view_truncated_output',
'list_dir',
'grep',
'find_files',
@@ -593,12 +719,74 @@ export const READ_ONLY_TOOL_NAMES = [
'watch_changes',
'get_semantic_neighborhoods',
'get_framework_analysis',
// v1.13.17-cross-repo-reads: pauses execution but doesn't mutate project
// state directly (the grant endpoint appends to sessions.allowed_read_paths
// only with user consent). Belongs in the read-only budget tier.
'request_read_access',
] as const;
export const TOOLS_BY_NAME: Record<string, ToolDef<unknown>> = Object.fromEntries(
ALL_TOOLS.map((t) => [t.name, t])
);
// v1.13.15-tools: tiered tool loading. BOOCODE_TOOLS env var (`core` |
// `standard` | `all`) filters the agent's tool whitelist before LLM dispatch.
// Daily-driver token win on qwen3.6-35b-a3b — the 35B-A3B MoE benefits from
// any prompt-cache stability win (fewer tools = shorter, more stable tool
// schemas in the system prompt). Pattern lift from eyaltoledano/claude-task-
// master (MIT + Commons Clause — pattern only, no code lift).
//
// The env var is a CEILING. It only narrows; never expands an agent's
// declared whitelist. Default behavior (var unset) is unchanged: all tools.
export const CORE_TOOL_NAMES = [
'view_file',
'list_dir',
'grep',
'find_files',
] as const;
export const STANDARD_TOOL_NAMES = [
...CORE_TOOL_NAMES,
'web_search',
'web_fetch',
'git_status',
'get_codebase_overview',
'get_file_analysis',
'get_symbol_info',
'search_symbols',
'get_dependencies',
'watch_changes',
'get_semantic_neighborhoods',
'get_framework_analysis',
] as const;
// Module-load validation: every name in CORE / STANDARD must exist in
// TOOLS_BY_NAME. Catches typos and stale tier definitions before they reach
// production; server boot fails loudly rather than silently filtering valid
// tools out of agent whitelists.
for (const name of CORE_TOOL_NAMES) {
if (!TOOLS_BY_NAME[name]) {
throw new Error(`CORE_TOOL_NAMES references unknown tool: '${name}'`);
}
}
for (const name of STANDARD_TOOL_NAMES) {
if (!TOOLS_BY_NAME[name]) {
throw new Error(`STANDARD_TOOL_NAMES references unknown tool: '${name}'`);
}
}
export function resolveToolTier(tier: string | undefined): readonly string[] {
switch ((tier ?? 'all').toLowerCase()) {
case 'core':
return CORE_TOOL_NAMES;
case 'standard':
return STANDARD_TOOL_NAMES;
case 'all':
default:
return ALL_TOOLS.map((t) => t.name);
}
}
export function toolJsonSchemas(): ToolJsonSchema[] {
return ALL_TOOLS.map((t) => t.jsonSchema);
}

View File

@@ -5,7 +5,7 @@ import type { ToolDef } from '../../tools.js';
import { callCodecontext, type CodecontextResponse } from '../../codecontext_client.js';
export const GetDependenciesInput = z.object({
file_path: z.string().optional(),
file_path: z.string().trim().optional(),
direction: z.enum(['incoming', 'outgoing', 'both']).optional(),
});
export type GetDependenciesInputT = z.infer<typeof GetDependenciesInput>;

View File

@@ -5,7 +5,7 @@ import type { ToolDef } from '../../tools.js';
import { callCodecontext, type CodecontextResponse } from '../../codecontext_client.js';
export const GetFileAnalysisInput = z.object({
file_path: z.string().min(1),
file_path: z.string().trim().min(1),
});
export type GetFileAnalysisInputT = z.infer<typeof GetFileAnalysisInput>;

View File

@@ -5,7 +5,7 @@ import type { ToolDef } from '../../tools.js';
import { callCodecontext, type CodecontextResponse } from '../../codecontext_client.js';
export const GetSemanticNeighborhoodsInput = z.object({
file_path: z.string().optional(),
file_path: z.string().trim().optional(),
include_basic: z.boolean().optional(),
include_quality: z.boolean().optional(),
max_results: z.number().int().positive().optional(),

View File

@@ -6,7 +6,7 @@ import { callCodecontext, type CodecontextResponse } from '../../codecontext_cli
export const GetSymbolInfoInput = z.object({
symbol_name: z.string().min(1),
file_path: z.string().optional(),
file_path: z.string().trim().optional(),
framework_type: z.string().optional(),
});
export type GetSymbolInfoInputT = z.infer<typeof GetSymbolInfoInput>;

View File

@@ -0,0 +1,170 @@
import { promises as fs } from 'fs';
import { randomBytes } from 'crypto';
import path from 'path';
import type { Sql } from '../db.js';
// v1.13.5: opencode-style truncation storage. When a tool slice would cut
// content the model might still want, we store the full text on tmpfs and
// hand the model an opaque id. view_truncated_output(id) retrieves it.
//
// Tmpfs path means full content vanishes on container restart; chats that
// outlive a restart lose retrieval (acceptable — the user has usually moved
// on or the data is stale). 7-day TTL + orphan reap bound disk growth via
// the periodic sweeper in index.ts.
export const TRUNCATION_DIR = process.env.BOOCODE_TRUNCATION_DIR ?? '/tmp/boocode-truncations';
export const TRUNCATION_TTL_MS = 7 * 24 * 60 * 60 * 1000;
// Matches view_file's MAX_FILE_BYTES — anything bigger was already refused
// at the source tool's size check, so we never see it here.
export const MAX_TRUNCATION_BYTES = 5 * 1024 * 1024;
const ID_RE = /^tr_[0-9a-v]{12}$/;
let dirEnsured = false;
async function ensureDir(): Promise<void> {
if (dirEnsured) return;
await fs.mkdir(TRUNCATION_DIR, { recursive: true, mode: 0o700 });
dirEnsured = true;
}
// 12 base32 chars ≈ 60 bits of entropy. Collision probability across a
// 7-day window with ~thousands of truncations is essentially zero.
function newId(): string {
const buf = randomBytes(8);
const alphabet = '0123456789abcdefghijklmnopqrstuv';
let out = 'tr_';
for (const byte of buf) {
out += alphabet[byte & 0x1f];
out += alphabet[(byte >> 3) & 0x1f];
}
return out.slice(0, 15);
}
function idToPath(id: string): string {
// Defense-in-depth: the model never supplies a path component (only ids),
// but a malformed id from anywhere else shouldn't escape TRUNCATION_DIR.
if (!ID_RE.test(id)) {
throw new Error(`Invalid truncation id: ${id}`);
}
return path.join(TRUNCATION_DIR, id);
}
export async function storeTruncation(fullContent: string): Promise<string> {
const bytes = Buffer.byteLength(fullContent, 'utf8');
if (bytes > MAX_TRUNCATION_BYTES) {
throw new Error(`Truncation content ${bytes}B exceeds ${MAX_TRUNCATION_BYTES}B cap`);
}
await ensureDir();
const id = newId();
await fs.writeFile(idToPath(id), fullContent, { encoding: 'utf8', mode: 0o600 });
return id;
}
export async function readTruncation(id: string): Promise<string | null> {
if (!ID_RE.test(id)) return null;
try {
return await fs.readFile(idToPath(id), { encoding: 'utf8' });
} catch (err) {
if ((err as NodeJS.ErrnoException).code === 'ENOENT') return null;
throw err;
}
}
// Wrap a tool's output. If wasTruncated, stash the full content on tmpfs
// and return its id alongside the sliced view the tool would have returned.
// Storage failure (disk full, permission denied) is non-fatal — the sliced
// view ships without an outputPath, which is exactly what the tool returned
// before v1.13.5. Same goes for content over MAX_TRUNCATION_BYTES.
export async function truncateIfNeeded(args: {
fullContent: string;
slicedContent: string;
wasTruncated: boolean;
}): Promise<{ content: string; truncated: boolean; outputPath?: string }> {
if (!args.wasTruncated) {
return { content: args.slicedContent, truncated: false };
}
const bytes = Buffer.byteLength(args.fullContent, 'utf8');
if (bytes > MAX_TRUNCATION_BYTES) {
return { content: args.slicedContent, truncated: true };
}
try {
const outputPath = await storeTruncation(args.fullContent);
return { content: args.slicedContent, truncated: true, outputPath };
} catch {
return { content: args.slicedContent, truncated: true };
}
}
// Periodic cleanup. Called from index.ts's sweep interval (v1.13.3 cadence).
// Pass 1: TTL — anything older than TRUNCATION_TTL_MS is gone.
// Pass 2: orphans — files with no live message_parts.payload->'output'->>'outputPath'
// reference. Catches the case where a part referencing an outputPath got
// hidden by prune (v1.13.4) and the file is now unreachable.
export async function cleanupTruncations(args: {
sql: Sql;
log: { warn: (obj: object, msg: string) => void; error: (obj: object, msg: string) => void };
}): Promise<{ ttlReaped: number; orphanReaped: number }> {
await ensureDir();
const cutoff = Date.now() - TRUNCATION_TTL_MS;
let ttlReaped = 0;
let orphanReaped = 0;
let entries: string[];
try {
entries = await fs.readdir(TRUNCATION_DIR);
} catch (err) {
args.log.error({ err }, 'cleanupTruncations readdir failed');
return { ttlReaped, orphanReaped };
}
if (entries.length === 0) return { ttlReaped, orphanReaped };
const survivors: string[] = [];
for (const name of entries) {
if (!ID_RE.test(name)) continue;
const full = path.join(TRUNCATION_DIR, name);
try {
const stat = await fs.stat(full);
if (stat.mtimeMs < cutoff) {
await fs.unlink(full);
ttlReaped += 1;
} else {
survivors.push(name);
}
} catch {
// File vanished between readdir and stat — fine.
}
}
if (survivors.length === 0) {
if (ttlReaped > 0) {
args.log.warn({ ttlReaped, orphanReaped: 0 }, 'cleanupTruncations reaped files');
}
return { ttlReaped, orphanReaped: 0 };
}
// outputPath rides inside the tool_result part's payload.output object
// (see partsFromToolMessage in inference/parts.ts), so the json path is
// payload->'output'->>'outputPath' rather than top-level.
const referenced = await args.sql<{ output_path: string }[]>`
SELECT DISTINCT p.payload->'output'->>'outputPath' AS output_path
FROM message_parts p
WHERE p.kind = 'tool_result'
AND p.payload->'output' ? 'outputPath'
AND p.payload->'output'->>'outputPath' = ANY(${survivors})
`;
const live = new Set(referenced.map((r) => r.output_path));
for (const name of survivors) {
if (live.has(name)) continue;
try {
await fs.unlink(path.join(TRUNCATION_DIR, name));
orphanReaped += 1;
} catch {
// ignore
}
}
if (ttlReaped > 0 || orphanReaped > 0) {
args.log.warn({ ttlReaped, orphanReaped }, 'cleanupTruncations reaped files');
}
return { ttlReaped, orphanReaped };
}

View File

@@ -11,6 +11,7 @@
import { z } from 'zod';
import { isPublicUrl } from './url_guard.js';
import type { ToolDef } from './tools.js';
import { truncateIfNeeded } from './truncate.js';
const WebFetchInput = z.object({
url: z.string().min(1).max(2048),
@@ -230,15 +231,24 @@ export async function executeWebFetch(
}
const truncated = truncate(textRaw, maxChars);
// v1.13.5: stash the full pre-slice body when truncation fires so the
// model can pull more via view_truncated_output(id) without re-fetching.
// textRaw is already bounded by MAX_BYTES (5MB), within truncate.ts's cap.
const wrapped = await truncateIfNeeded({
fullContent: textRaw,
slicedContent: truncated.content,
wasTruncated: truncated.truncated,
});
// Report the FINAL URL (post-redirects) so the LLM knows where the body
// came from — useful for citations and for the model to reason about
// domain trust.
return {
url: currentUrl,
title,
content: truncated.content,
content: wrapped.content,
content_type: contentType,
truncated: truncated.truncated,
truncated: wrapped.truncated,
...(wrapped.outputPath ? { outputPath: wrapped.outputPath } : {}),
};
}

View File

@@ -42,9 +42,40 @@ export interface Session {
// v1.12.1: server-side workspace pane layout. Replaces per-device
// localStorage so all devices viewing the session see the same panes.
workspace_panes: WorkspacePane[];
// v1.13.17: absolute paths the agent has been granted read access to via
// the request_read_access tool. Empty by default; populated only by the
// grant_read_access endpoint's allow branch. Revoked via PATCH session.
// path_guard's extraRoots check consults this list before refusing reads
// outside the primary project root.
allowed_read_paths: string[];
}
export type WorkspacePaneKind = 'chat' | 'terminal' | 'agent' | 'empty' | 'settings';
// v1.14.x-html-artifact-panes: 'markdown_artifact' + 'html_artifact' added.
// Optional payload state lives on the pane row itself so the jsonb survives
// a hard reload without needing a re-fetch.
export type WorkspacePaneKind =
| 'chat'
| 'terminal'
| 'agent'
| 'empty'
| 'settings'
| 'markdown_artifact'
| 'html_artifact';
// v1.14.x: reference-only — the actual artifact body lives in the message
// row (markdown) or message_parts.payload (html_artifact). Pane components
// fetch on mount.
export interface MarkdownArtifactState {
chat_id: string;
message_id: string;
title: string;
}
export interface HtmlArtifactState {
chat_id: string;
message_id: string;
title: string;
}
export interface WorkspacePane {
id: string;
@@ -52,6 +83,9 @@ export interface WorkspacePane {
chatId?: string;
chatIds: string[];
activeChatIdx: number;
// v1.14.x: populated only when kind === 'markdown_artifact' / 'html_artifact'.
markdown_artifact_state?: MarkdownArtifactState;
html_artifact_state?: HtmlArtifactState;
}
// v1.8.1: agents come from two sources. 'global' = /data/AGENTS.md (always

View File

@@ -0,0 +1,328 @@
// v1.13.11-a: Zod schemas for every WebSocket frame published by the server.
// Validation runs both on send (broker.publishFrame / publishUserFrame) and
// on receive (apps/web/src/hooks/useSessionStream + useUserEvents). Catches
// silent protocol drift between publisher and consumer.
//
// IMPORTANT: This file is duplicated byte-identical at
// apps/web/src/api/ws-frames.ts. The two apps have separate tsconfigs and
// no path alias; the duplication is sync-by-hand. A test asserts the two
// files match. If you change one, change the other.
//
// Per-kind payload schemas (tool_call args, message_parts payloads, etc.)
// stay z.unknown() in v1.13.11. Frame-level drift detection is the goal;
// deep payload validation is follow-up work.
import { z } from 'zod';
// ---- shared primitives -----------------------------------------------------
const Uuid = z.string().uuid();
// Tool call IDs are model-emitted (e.g. "call_abc123") — not UUIDs.
const ToolCallId = z.string().min(1);
// v1.13.12 fix: postgres returns timestamp columns as JS Date objects, not
// strings. The publish sites pass them through unchanged, so the schema must
// tolerate both. preprocess converts Date → ISO string before string-validation;
// on the web side (where frames arrive via JSON.parse) it's a no-op. Before
// this fix, every message_complete / session_updated / chat_updated frame
// failed validation and got dropped — symptoms: token tracking blank in UI,
// status stuck at 'streaming' tripping the 60s stale-stream banner.
const IsoTimestamp = z.preprocess(
(v) => (v instanceof Date ? v.toISOString() : v),
z.string().min(1),
);
const ChatStatusValue = z.enum([
'streaming',
'tool_running',
'waiting_for_input',
'idle',
'error',
]);
const ErrorReasonValue = z.enum([
'llm_provider_error',
'doom_loop',
'doom_loop_summary_failed',
'cap_hit',
'cap_hit_summary_failed',
]);
const MessageRoleValue = z.enum(['user', 'assistant', 'system', 'tool']);
const ToolCallShape = z.object({
id: ToolCallId,
name: z.string().min(1),
args: z.record(z.string(), z.unknown()),
});
// Free-form bags: opaque to the frame schema; deep validation is out of
// scope for v1.13.11 (frame-level drift detection is the goal; per-kind
// payload narrowing is follow-up work). z.unknown() means the consumer
// must narrow before reading — TypeScript-side this is fine because every
// consumer already operates on the hand-maintained Project / Chat / Session
// / WorkspacePane types (the brief's "Don't strip existing types yet"
// rule), and the Zod-typed shape is only used at the publishFrame boundary.
const OpaqueObject = z.unknown();
// ---- per-session channel frames --------------------------------------------
export const SnapshotFrame = z.object({
type: z.literal('snapshot'),
messages: z.array(OpaqueObject),
});
export const MessageStartedFrame = z.object({
type: z.literal('message_started'),
message_id: Uuid,
chat_id: Uuid.optional(),
role: MessageRoleValue,
});
export const DeltaFrame = z.object({
type: z.literal('delta'),
message_id: Uuid,
chat_id: Uuid.optional(),
content: z.string(),
});
export const ToolCallFrame = z.object({
type: z.literal('tool_call'),
message_id: Uuid,
chat_id: Uuid.optional(),
tool_call: ToolCallShape,
});
export const ToolResultFrame = z.object({
type: z.literal('tool_result'),
tool_message_id: Uuid,
chat_id: Uuid.optional(),
tool_call_id: ToolCallId,
output: z.unknown(),
truncated: z.boolean(),
error: z.string().optional(),
});
export const MessageCompleteFrame = z.object({
type: z.literal('message_complete'),
message_id: Uuid,
chat_id: Uuid.optional(),
tokens_used: z.number().int().nonnegative().nullable().optional(),
ctx_used: z.number().int().nonnegative().nullable().optional(),
ctx_max: z.number().int().positive().nullable().optional(),
started_at: IsoTimestamp.nullable().optional(),
finished_at: IsoTimestamp.nullable().optional(),
model: z.string().optional(),
metadata: OpaqueObject.nullable().optional(),
});
export const UsageFrame = z.object({
type: z.literal('usage'),
message_id: Uuid,
chat_id: Uuid.optional(),
completion_tokens: z.number().int().nonnegative().nullable(),
ctx_used: z.number().int().nonnegative().nullable(),
ctx_max: z.number().int().positive().nullable(),
});
export const MessagesDeletedFrame = z.object({
type: z.literal('messages_deleted'),
message_ids: z.array(Uuid),
chat_id: Uuid.optional(),
});
export const ChatRenamedFrame = z.object({
type: z.literal('chat_renamed'),
chat_id: Uuid,
name: z.string(),
});
export const CompactedFrame = z.object({
type: z.literal('compacted'),
session_id: Uuid,
chat_id: Uuid,
summary_message_id: Uuid,
});
export const ErrorFrame = z.object({
type: z.literal('error'),
message_id: Uuid.optional(),
chat_id: Uuid.optional(),
error: z.string(),
reason: ErrorReasonValue.optional(),
});
// ---- per-user channel frames (sidebar refresh) -----------------------------
export const ChatStatusFrame = z.object({
type: z.literal('chat_status'),
chat_id: Uuid,
status: ChatStatusValue,
at: IsoTimestamp,
reason: ErrorReasonValue.optional(),
});
export const SessionUpdatedFrame = z.object({
type: z.literal('session_updated'),
session_id: Uuid,
project_id: Uuid,
name: z.string(),
updated_at: IsoTimestamp,
});
export const SessionRenamedFrame = z.object({
type: z.literal('session_renamed'),
session_id: Uuid,
name: z.string(),
});
export const SessionCreatedFrame = z.object({
type: z.literal('session_created'),
session: OpaqueObject,
project_id: Uuid,
});
export const SessionArchivedFrame = z.object({
type: z.literal('session_archived'),
session_id: Uuid,
project_id: Uuid,
});
export const SessionDeletedFrame = z.object({
type: z.literal('session_deleted'),
session_id: Uuid,
project_id: Uuid,
});
export const SessionWorkspaceUpdatedFrame = z.object({
type: z.literal('session_workspace_updated'),
session_id: Uuid,
workspace_panes: z.array(OpaqueObject),
});
export const ChatCreatedFrame = z.object({
type: z.literal('chat_created'),
chat: OpaqueObject,
session_id: Uuid,
});
export const ChatUpdatedFrame = z.object({
type: z.literal('chat_updated'),
chat_id: Uuid,
session_id: Uuid,
name: z.string().nullable(),
updated_at: IsoTimestamp,
});
export const ChatArchivedFrame = z.object({
type: z.literal('chat_archived'),
chat_id: Uuid,
session_id: Uuid,
});
export const ChatUnarchivedFrame = z.object({
type: z.literal('chat_unarchived'),
chat: OpaqueObject,
});
export const ChatDeletedFrame = z.object({
type: z.literal('chat_deleted'),
chat_id: Uuid,
session_id: Uuid,
});
export const ProjectCreatedFrame = z.object({
type: z.literal('project_created'),
project: OpaqueObject,
});
export const ProjectArchivedFrame = z.object({
type: z.literal('project_archived'),
project_id: Uuid,
});
export const ProjectUnarchivedFrame = z.object({
type: z.literal('project_unarchived'),
project: OpaqueObject,
});
export const ProjectUpdatedFrame = z.object({
type: z.literal('project_updated'),
project_id: Uuid,
name: z.string(),
});
export const ProjectDeletedFrame = z.object({
type: z.literal('project_deleted'),
project_id: Uuid,
});
// ---- discriminated union ---------------------------------------------------
export const WsFrameSchema = z.discriminatedUnion('type', [
// per-session
SnapshotFrame,
MessageStartedFrame,
DeltaFrame,
ToolCallFrame,
ToolResultFrame,
MessageCompleteFrame,
UsageFrame,
MessagesDeletedFrame,
ChatRenamedFrame,
CompactedFrame,
ErrorFrame,
// per-user
ChatStatusFrame,
SessionUpdatedFrame,
SessionRenamedFrame,
SessionCreatedFrame,
SessionArchivedFrame,
SessionDeletedFrame,
SessionWorkspaceUpdatedFrame,
ChatCreatedFrame,
ChatUpdatedFrame,
ChatArchivedFrame,
ChatUnarchivedFrame,
ChatDeletedFrame,
ProjectCreatedFrame,
ProjectArchivedFrame,
ProjectUnarchivedFrame,
ProjectUpdatedFrame,
ProjectDeletedFrame,
]);
export type WsFrame = z.infer<typeof WsFrameSchema>;
// Convenience: the set of known frame types. Useful for the publishFrame
// helper to log the offending type name when validation fails. Kept in sync
// by hand with the discriminated union above.
export const KNOWN_FRAME_TYPES: readonly WsFrame['type'][] = [
'snapshot',
'message_started',
'delta',
'tool_call',
'tool_result',
'message_complete',
'usage',
'messages_deleted',
'chat_renamed',
'compacted',
'error',
'chat_status',
'session_updated',
'session_renamed',
'session_created',
'session_archived',
'session_deleted',
'session_workspace_updated',
'chat_created',
'chat_updated',
'chat_archived',
'chat_unarchived',
'chat_deleted',
'project_created',
'project_archived',
'project_unarchived',
'project_updated',
'project_deleted',
] as const;

View File

@@ -31,7 +31,8 @@
"shiki": "^1.29.2",
"sonner": "^2.0.7",
"tailwind-merge": "^3.6.0",
"tw-animate-css": "^1.4.0"
"tw-animate-css": "^1.4.0",
"zod": "^3.23.8"
},
"devDependencies": {
"@tailwindcss/postcss": "^4.3.0",

View File

@@ -12,6 +12,7 @@ import type {
GitMeta,
Skill,
AskUserAnswer,
ToolCostStat,
} from './types';
export class ApiError extends Error {
@@ -122,7 +123,20 @@ export const api = {
get: (id: string) => request<Session>(`/api/sessions/${id}`),
update: (
id: string,
body: Partial<Pick<Session, 'name' | 'model' | 'system_prompt' | 'agent_id' | 'web_search_enabled'>>
body: Partial<
Pick<
Session,
| 'name'
| 'model'
| 'system_prompt'
| 'agent_id'
| 'web_search_enabled'
// v1.13.17: revocation path — frontend sends the shortened list
// when the user removes a grant. Grants are appended only via the
// separate grantReadAccess endpoint below.
| 'allowed_read_paths'
>
>
) =>
request<Session>(`/api/sessions/${id}`, {
method: 'PATCH',
@@ -227,6 +241,19 @@ export const api = {
body: JSON.stringify({ tool_call_id: toolCallId, answers }),
},
),
// v1.13.17-cross-repo-reads: resume a paused request_read_access. On
// 'allow' the server re-resolves the grant root and appends it to
// sessions.allowed_read_paths; the returned list reflects the post-
// grant state. On 'deny' the array is unchanged.
grantReadAccess: (chatId: string, toolCallId: string, decision: 'allow' | 'deny') =>
request<{
tool_message_id: string;
assistant_message_id: string;
allowed_read_paths: string[];
}>(`/api/chats/${chatId}/grant_read_access`, {
method: 'POST',
body: JSON.stringify({ tool_call_id: toolCallId, decision }),
}),
},
messages: {
@@ -249,6 +276,24 @@ export const api = {
request<void>(`/api/chats/${chatId}/messages/${messageId}`, {
method: 'DELETE',
}),
// v1.14.x-html-artifact-panes: write the artifact to
// <projectRoot>/.boocode/artifacts/<slug>-<ts>.<ext> and return the
// path + a /api/projects/.../artifacts/<filename> URL the browser can
// GET to download. fmt=html requires the assistant message to carry an
// html_artifact part (404 otherwise).
downloadArtifact: (chatId: string, messageId: string, fmt: 'md' | 'html') =>
request<{ path: string; url: string }>(
`/api/chats/${chatId}/messages/${messageId}/artifacts/download?fmt=${fmt}`,
{ method: 'POST' },
),
// v1.14.x-html-artifact-panes: fetch the html_artifact part payload so
// HtmlArtifactPane can render the iframe srcdoc. 404 = no html_artifact
// part on this message; MessageBubble uses that as a signal to fall back
// to the markdown pane variant.
getHtmlArtifact: (chatId: string, messageId: string) =>
request<{ html_content: string; char_count: number; title: string }>(
`/api/chats/${chatId}/messages/${messageId}/html_artifact`,
),
},
models: () => request<ModelInfo[]>('/api/models'),
@@ -262,6 +307,14 @@ export const api = {
list: () => request<{ skills: Skill[] }>('/api/skills'),
},
// v1.13.10: per-tool cost rolling-window stats (last 100 calls per tool,
// equal-split attribution across multi-tool turns). Read endpoint backed by
// the tool_cost_stats view. AgentPicker consumes this for per-agent cost
// hints.
tools: {
costStats: () => request<{ stats: ToolCostStat[] }>('/api/tools/cost_stats'),
},
settings: {
get: () => request<Record<string, unknown>>('/api/settings'),
patch: (body: Record<string, unknown>) =>

View File

@@ -1,6 +1,18 @@
export const PROJECT_STATUSES = ['open', 'archived'] as const;
export type ProjectStatus = typeof PROJECT_STATUSES[number];
// v1.13.10: per-tool cost rolling-window stat. Returned by
// GET /api/tools/cost_stats — one entry per tool with mean prompt/completion
// tokens over the last 100 invocations. AgentPicker sums across an agent's
// whitelisted tools for per-agent cost hints.
export interface ToolCostStat {
tool_name: string;
mean_prompt_tokens: number;
mean_completion_tokens: number;
n_calls: number;
updated_at: string;
}
export interface Project {
id: string;
name: string;
@@ -36,6 +48,11 @@ export interface Session {
web_search_enabled: boolean | null;
// v1.12.1: server-authoritative pane layout, replaces localStorage.
workspace_panes: WorkspacePane[];
// v1.13.17: paths the agent has been granted read access to via the
// request_read_access tool. Empty by default. Settings UI surfaces the
// list with per-row revoke; the grant flow itself appends through the
// dedicated POST /api/chats/:id/grant_read_access endpoint (not PATCH).
allowed_read_paths: string[];
}
// v1.8.1: 'global' = /data/AGENTS.md (always-on), 'project' = per-project
@@ -299,7 +316,37 @@ export interface AskUserAnswerSet {
// v1.9: 'settings' is an ephemeral pane kind — never persisted, always
// singleton per workspace. The pane hook filters it out before writing to
// localStorage and dedupes on insertion via toggleSettingsPane().
export type WorkspacePaneKind = 'chat' | 'terminal' | 'agent' | 'empty' | 'settings';
// v1.14.x-html-artifact-panes: 'markdown_artifact' + 'html_artifact' added.
// Both carry payload state on the WorkspacePane row itself so
// useWorkspacePanes's JSON-string dedup + persisted jsonb stay self-contained
// — no extra fetch on rehydrate.
export type WorkspacePaneKind =
| 'chat'
| 'terminal'
| 'agent'
| 'empty'
| 'settings'
| 'markdown_artifact'
| 'html_artifact';
// v1.14.x: per-pane artifact payloads. Optional + namespaced so older saved
// pane rows (without these fields) deserialize unchanged.
// v1.14.x: pane state is a reference only — the pane component fetches the
// actual content on mount. This keeps sessions.workspace_panes jsonb small and
// makes the message body / html_artifact part the single source of truth.
export interface MarkdownArtifactState {
// chat_id is needed for the download endpoint
// (POST /api/chats/:chat_id/messages/:msg_id/artifacts/download).
chat_id: string;
message_id: string;
title: string;
}
export interface HtmlArtifactState {
chat_id: string;
message_id: string;
title: string;
}
export interface WorkspacePane {
id: string;
@@ -307,6 +354,9 @@ export interface WorkspacePane {
chatId?: string;
chatIds: string[];
activeChatIdx: number;
// v1.14.x: populated only when kind === 'markdown_artifact' / 'html_artifact'.
markdown_artifact_state?: MarkdownArtifactState;
html_artifact_state?: HtmlArtifactState;
}
export type WsFrame =

View File

@@ -0,0 +1,328 @@
// v1.13.11-a: Zod schemas for every WebSocket frame published by the server.
// Validation runs both on send (broker.publishFrame / publishUserFrame) and
// on receive (apps/web/src/hooks/useSessionStream + useUserEvents). Catches
// silent protocol drift between publisher and consumer.
//
// IMPORTANT: This file is duplicated byte-identical at
// apps/web/src/api/ws-frames.ts. The two apps have separate tsconfigs and
// no path alias; the duplication is sync-by-hand. A test asserts the two
// files match. If you change one, change the other.
//
// Per-kind payload schemas (tool_call args, message_parts payloads, etc.)
// stay z.unknown() in v1.13.11. Frame-level drift detection is the goal;
// deep payload validation is follow-up work.
import { z } from 'zod';
// ---- shared primitives -----------------------------------------------------
const Uuid = z.string().uuid();
// Tool call IDs are model-emitted (e.g. "call_abc123") — not UUIDs.
const ToolCallId = z.string().min(1);
// v1.13.12 fix: postgres returns timestamp columns as JS Date objects, not
// strings. The publish sites pass them through unchanged, so the schema must
// tolerate both. preprocess converts Date → ISO string before string-validation;
// on the web side (where frames arrive via JSON.parse) it's a no-op. Before
// this fix, every message_complete / session_updated / chat_updated frame
// failed validation and got dropped — symptoms: token tracking blank in UI,
// status stuck at 'streaming' tripping the 60s stale-stream banner.
const IsoTimestamp = z.preprocess(
(v) => (v instanceof Date ? v.toISOString() : v),
z.string().min(1),
);
const ChatStatusValue = z.enum([
'streaming',
'tool_running',
'waiting_for_input',
'idle',
'error',
]);
const ErrorReasonValue = z.enum([
'llm_provider_error',
'doom_loop',
'doom_loop_summary_failed',
'cap_hit',
'cap_hit_summary_failed',
]);
const MessageRoleValue = z.enum(['user', 'assistant', 'system', 'tool']);
const ToolCallShape = z.object({
id: ToolCallId,
name: z.string().min(1),
args: z.record(z.string(), z.unknown()),
});
// Free-form bags: opaque to the frame schema; deep validation is out of
// scope for v1.13.11 (frame-level drift detection is the goal; per-kind
// payload narrowing is follow-up work). z.unknown() means the consumer
// must narrow before reading — TypeScript-side this is fine because every
// consumer already operates on the hand-maintained Project / Chat / Session
// / WorkspacePane types (the brief's "Don't strip existing types yet"
// rule), and the Zod-typed shape is only used at the publishFrame boundary.
const OpaqueObject = z.unknown();
// ---- per-session channel frames --------------------------------------------
export const SnapshotFrame = z.object({
type: z.literal('snapshot'),
messages: z.array(OpaqueObject),
});
export const MessageStartedFrame = z.object({
type: z.literal('message_started'),
message_id: Uuid,
chat_id: Uuid.optional(),
role: MessageRoleValue,
});
export const DeltaFrame = z.object({
type: z.literal('delta'),
message_id: Uuid,
chat_id: Uuid.optional(),
content: z.string(),
});
export const ToolCallFrame = z.object({
type: z.literal('tool_call'),
message_id: Uuid,
chat_id: Uuid.optional(),
tool_call: ToolCallShape,
});
export const ToolResultFrame = z.object({
type: z.literal('tool_result'),
tool_message_id: Uuid,
chat_id: Uuid.optional(),
tool_call_id: ToolCallId,
output: z.unknown(),
truncated: z.boolean(),
error: z.string().optional(),
});
export const MessageCompleteFrame = z.object({
type: z.literal('message_complete'),
message_id: Uuid,
chat_id: Uuid.optional(),
tokens_used: z.number().int().nonnegative().nullable().optional(),
ctx_used: z.number().int().nonnegative().nullable().optional(),
ctx_max: z.number().int().positive().nullable().optional(),
started_at: IsoTimestamp.nullable().optional(),
finished_at: IsoTimestamp.nullable().optional(),
model: z.string().optional(),
metadata: OpaqueObject.nullable().optional(),
});
export const UsageFrame = z.object({
type: z.literal('usage'),
message_id: Uuid,
chat_id: Uuid.optional(),
completion_tokens: z.number().int().nonnegative().nullable(),
ctx_used: z.number().int().nonnegative().nullable(),
ctx_max: z.number().int().positive().nullable(),
});
export const MessagesDeletedFrame = z.object({
type: z.literal('messages_deleted'),
message_ids: z.array(Uuid),
chat_id: Uuid.optional(),
});
export const ChatRenamedFrame = z.object({
type: z.literal('chat_renamed'),
chat_id: Uuid,
name: z.string(),
});
export const CompactedFrame = z.object({
type: z.literal('compacted'),
session_id: Uuid,
chat_id: Uuid,
summary_message_id: Uuid,
});
export const ErrorFrame = z.object({
type: z.literal('error'),
message_id: Uuid.optional(),
chat_id: Uuid.optional(),
error: z.string(),
reason: ErrorReasonValue.optional(),
});
// ---- per-user channel frames (sidebar refresh) -----------------------------
export const ChatStatusFrame = z.object({
type: z.literal('chat_status'),
chat_id: Uuid,
status: ChatStatusValue,
at: IsoTimestamp,
reason: ErrorReasonValue.optional(),
});
export const SessionUpdatedFrame = z.object({
type: z.literal('session_updated'),
session_id: Uuid,
project_id: Uuid,
name: z.string(),
updated_at: IsoTimestamp,
});
export const SessionRenamedFrame = z.object({
type: z.literal('session_renamed'),
session_id: Uuid,
name: z.string(),
});
export const SessionCreatedFrame = z.object({
type: z.literal('session_created'),
session: OpaqueObject,
project_id: Uuid,
});
export const SessionArchivedFrame = z.object({
type: z.literal('session_archived'),
session_id: Uuid,
project_id: Uuid,
});
export const SessionDeletedFrame = z.object({
type: z.literal('session_deleted'),
session_id: Uuid,
project_id: Uuid,
});
export const SessionWorkspaceUpdatedFrame = z.object({
type: z.literal('session_workspace_updated'),
session_id: Uuid,
workspace_panes: z.array(OpaqueObject),
});
export const ChatCreatedFrame = z.object({
type: z.literal('chat_created'),
chat: OpaqueObject,
session_id: Uuid,
});
export const ChatUpdatedFrame = z.object({
type: z.literal('chat_updated'),
chat_id: Uuid,
session_id: Uuid,
name: z.string().nullable(),
updated_at: IsoTimestamp,
});
export const ChatArchivedFrame = z.object({
type: z.literal('chat_archived'),
chat_id: Uuid,
session_id: Uuid,
});
export const ChatUnarchivedFrame = z.object({
type: z.literal('chat_unarchived'),
chat: OpaqueObject,
});
export const ChatDeletedFrame = z.object({
type: z.literal('chat_deleted'),
chat_id: Uuid,
session_id: Uuid,
});
export const ProjectCreatedFrame = z.object({
type: z.literal('project_created'),
project: OpaqueObject,
});
export const ProjectArchivedFrame = z.object({
type: z.literal('project_archived'),
project_id: Uuid,
});
export const ProjectUnarchivedFrame = z.object({
type: z.literal('project_unarchived'),
project: OpaqueObject,
});
export const ProjectUpdatedFrame = z.object({
type: z.literal('project_updated'),
project_id: Uuid,
name: z.string(),
});
export const ProjectDeletedFrame = z.object({
type: z.literal('project_deleted'),
project_id: Uuid,
});
// ---- discriminated union ---------------------------------------------------
export const WsFrameSchema = z.discriminatedUnion('type', [
// per-session
SnapshotFrame,
MessageStartedFrame,
DeltaFrame,
ToolCallFrame,
ToolResultFrame,
MessageCompleteFrame,
UsageFrame,
MessagesDeletedFrame,
ChatRenamedFrame,
CompactedFrame,
ErrorFrame,
// per-user
ChatStatusFrame,
SessionUpdatedFrame,
SessionRenamedFrame,
SessionCreatedFrame,
SessionArchivedFrame,
SessionDeletedFrame,
SessionWorkspaceUpdatedFrame,
ChatCreatedFrame,
ChatUpdatedFrame,
ChatArchivedFrame,
ChatUnarchivedFrame,
ChatDeletedFrame,
ProjectCreatedFrame,
ProjectArchivedFrame,
ProjectUnarchivedFrame,
ProjectUpdatedFrame,
ProjectDeletedFrame,
]);
export type WsFrame = z.infer<typeof WsFrameSchema>;
// Convenience: the set of known frame types. Useful for the publishFrame
// helper to log the offending type name when validation fails. Kept in sync
// by hand with the discriminated union above.
export const KNOWN_FRAME_TYPES: readonly WsFrame['type'][] = [
'snapshot',
'message_started',
'delta',
'tool_call',
'tool_result',
'message_complete',
'usage',
'messages_deleted',
'chat_renamed',
'compacted',
'error',
'chat_status',
'session_updated',
'session_renamed',
'session_created',
'session_archived',
'session_deleted',
'session_workspace_updated',
'chat_created',
'chat_updated',
'chat_archived',
'chat_unarchived',
'chat_deleted',
'project_created',
'project_archived',
'project_unarchived',
'project_updated',
'project_deleted',
] as const;

View File

@@ -1,8 +1,8 @@
import { useEffect, useState } from 'react';
import { useEffect, useMemo, useState } from 'react';
import { Check, ChevronDown } from 'lucide-react';
import { toast } from 'sonner';
import { api } from '@/api/client';
import type { Agent, AgentParseError } from '@/api/types';
import type { Agent, AgentParseError, ToolCostStat } from '@/api/types';
import {
DropdownMenu,
DropdownMenuContent,
@@ -22,6 +22,10 @@ export function AgentPicker({ projectId, value, onChange }: Props) {
const [parseErrors, setParseErrors] = useState<AgentParseError[]>([]);
const [error, setError] = useState<string | null>(null);
const [open, setOpen] = useState(false);
// v1.13.10: per-tool cost rolling window. Fetched once on mount; would
// refresh on remount or page reload. Acceptable for a decision aid — the
// 100-call rolling mean doesn't shift fast.
const [costStats, setCostStats] = useState<ToolCostStat[]>([]);
// v1.8.1: per-agent parse errors are non-blocking. Silent if any agents
// loaded successfully; a gray warning toast fires only when EVERY agent
@@ -52,6 +56,29 @@ export function AgentPicker({ projectId, value, onChange }: Props) {
};
}, [projectId]);
// v1.13.10: cost stats are project-independent — the 100-call rolling
// window is global across all chats. Fetch once per mount; tolerate failure
// silently (cost line hides).
useEffect(() => {
let cancelled = false;
api.tools
.costStats()
.then((r) => {
if (!cancelled) setCostStats(r.stats);
})
.catch(() => {
if (!cancelled) setCostStats([]);
});
return () => {
cancelled = true;
};
}, []);
const costByTool = useMemo(
() => Object.fromEntries(costStats.map((s) => [s.tool_name, s])),
[costStats],
);
const selectedAgent = agents?.find((a) => a.id === value) ?? null;
const triggerLabel = value === null
? 'No agent'
@@ -86,25 +113,33 @@ export function AgentPicker({ projectId, value, onChange }: Props) {
<span className="font-medium">No agent</span>
</DropdownMenuItem>
{agents.length > 0 && <DropdownMenuSeparator />}
{agents.map((a) => (
<DropdownMenuItem
key={a.id}
onSelect={() => void onChange(a.id)}
className="text-xs flex-col items-start gap-0.5"
>
<div className="flex items-center gap-1.5">
<Check
className={`size-3 ${a.id === value ? 'opacity-100' : 'opacity-0'}`}
/>
<span className="font-medium">{a.name}</span>
</div>
{a.description && (
<span className="text-muted-foreground pl-[18px] truncate w-full">
{a.description}
</span>
)}
</DropdownMenuItem>
))}
{agents.map((a) => {
const cost = agentCost(a, costByTool);
return (
<DropdownMenuItem
key={a.id}
onSelect={() => void onChange(a.id)}
className="text-xs flex-col items-start gap-0.5"
>
<div className="flex items-center gap-1.5">
<Check
className={`size-3 ${a.id === value ? 'opacity-100' : 'opacity-0'}`}
/>
<span className="font-medium">{a.name}</span>
</div>
{a.description && (
<span className="text-muted-foreground pl-[18px] truncate w-full">
{a.description}
</span>
)}
{cost.nWithData > 0 && (
<span className="text-muted-foreground/70 pl-[18px] truncate w-full">
~{formatK(cost.prompt)} prompt / {cost.completion} completion · {cost.nWithData}/{cost.nTools} tools{cost.mostRecent ? ` · last call ${formatAgo(cost.mostRecent)}` : ''}
</span>
)}
</DropdownMenuItem>
);
})}
{parseErrors.length > 0 && (
<div
className="px-2 py-1.5 mt-1 text-xs text-amber-500 border-t border-border"
@@ -119,3 +154,49 @@ export function AgentPicker({ projectId, value, onChange }: Props) {
</DropdownMenu>
);
}
// v1.13.10: sum the per-tool means across an agent's whitelisted tools.
// Sum-of-means, not mean-of-sums — we're combining independent rolling
// averages. nWithData reflects how many of the agent's tools have any
// history yet; the line hides entirely when zero so a fresh deploy doesn't
// render "0k / 0 / 0 tools".
function agentCost(
agent: Agent,
costByTool: Record<string, ToolCostStat>,
): {
prompt: number;
completion: number;
nTools: number;
nWithData: number;
mostRecent: string | null;
} {
let prompt = 0;
let completion = 0;
let nWithData = 0;
let mostRecent: string | null = null;
for (const t of agent.tools) {
const s = costByTool[t];
if (!s) continue;
prompt += s.mean_prompt_tokens;
completion += s.mean_completion_tokens;
nWithData++;
if (!mostRecent || s.updated_at > mostRecent) mostRecent = s.updated_at;
}
return { prompt, completion, nTools: agent.tools.length, nWithData, mostRecent };
}
function formatK(n: number): string {
if (n < 1000) return String(n);
if (n < 10_000) return `${(n / 1000).toFixed(1)}k`;
return `${Math.round(n / 1000)}k`;
}
function formatAgo(iso: string): string {
const then = new Date(iso).getTime();
if (Number.isNaN(then)) return '—';
const diff = Date.now() - then;
if (diff < 60_000) return 'just now';
if (diff < 3_600_000) return `${Math.round(diff / 60_000)}m ago`;
if (diff < 86_400_000) return `${Math.round(diff / 3_600_000)}h ago`;
return `${Math.round(diff / 86_400_000)}d ago`;
}

View File

@@ -2,7 +2,6 @@ import { useState } from 'react';
import { Bot, History, MessageSquare, Plus, Terminal, X } from 'lucide-react';
import type { Chat, WorkspacePane } from '@/api/types';
import { StatusDot } from '@/components/StatusDot';
import { ChatThroughput } from '@/components/ChatThroughput';
import {
ContextMenu,
ContextMenuContent,
@@ -100,7 +99,6 @@ export function ChatTabBar({
>
<MessageSquare size={12} className="shrink-0" />
<StatusDot chatId={chat.id} />
<ChatThroughput chatId={chat.id} />
{renamingId === chat.id ? (
<input
autoFocus

View File

@@ -0,0 +1,116 @@
// v1.14.x-html-artifact-panes: full-height HTML artifact viewer. Renders the
// model's HTML inside a sandboxed iframe — no allow-same-origin, srcdoc only
// (no separate URL), CSP injected by the backend writer. JS runs inside the
// iframe (interactive controls work) but fetch / WS / tracking pixels are
// blocked by connect-src 'none' on the CSP. NO Copy button per the spec.
//
// Pane state is a reference only (chat_id + message_id + title); the iframe
// payload is fetched on mount from
// GET /api/chats/:chat_id/messages/:msg_id/html_artifact so that
// sessions.workspace_panes jsonb stays small and message_parts.payload is the
// single source of truth.
import { useEffect, useState } from 'react';
import { Download, X } from 'lucide-react';
import { toast } from 'sonner';
import { api } from '@/api/client';
import type { HtmlArtifactState } from '@/api/types';
interface Props {
chatId: string;
state: HtmlArtifactState;
onClose: () => void;
}
export function HtmlArtifactPane({ chatId, state, onClose }: Props) {
const [downloading, setDownloading] = useState(false);
const [htmlContent, setHtmlContent] = useState<string | null>(null);
const [loadError, setLoadError] = useState<string | null>(null);
useEffect(() => {
let cancelled = false;
setHtmlContent(null);
setLoadError(null);
void (async () => {
try {
const payload = await api.messages.getHtmlArtifact(chatId, state.message_id);
if (cancelled) return;
setHtmlContent(payload.html_content);
} catch (err) {
if (cancelled) return;
setLoadError(err instanceof Error ? err.message : 'failed to load HTML artifact');
}
})();
return () => {
cancelled = true;
};
}, [chatId, state.message_id]);
async function download() {
if (downloading) return;
setDownloading(true);
try {
const { url, path } = await api.messages.downloadArtifact(
chatId,
state.message_id,
'html',
);
const a = document.createElement('a');
a.href = url;
a.rel = 'noopener';
a.click();
toast.success(`Saved to ${path}`);
} catch (err) {
toast.error(err instanceof Error ? err.message : 'download failed');
} finally {
setDownloading(false);
}
}
return (
<div className="flex flex-col h-full min-h-0">
<div className="flex items-center gap-2 border-b border-border bg-muted/30 px-2 py-1 shrink-0">
<span className="text-xs text-muted-foreground truncate flex-1" title={state.title}>
{state.title || 'HTML artifact'}
</span>
<button
type="button"
onClick={() => void download()}
disabled={downloading || htmlContent === null}
className="inline-flex items-center justify-center size-5 rounded text-muted-foreground hover:bg-muted hover:text-foreground disabled:opacity-40 max-md:min-h-[44px] max-md:min-w-[44px]"
aria-label="Download HTML"
title="Download"
>
<Download size={12} />
</button>
<button
type="button"
onClick={onClose}
className="inline-flex items-center justify-center size-5 rounded text-muted-foreground hover:bg-muted hover:text-foreground max-md:min-h-[44px] max-md:min-w-[44px]"
aria-label="Close artifact pane"
title="Close"
>
<X size={12} />
</button>
</div>
<div className="flex-1 min-h-0 overflow-hidden bg-background">
{loadError ? (
<div className="p-4 text-sm text-destructive">Failed to load: {loadError}</div>
) : htmlContent === null ? (
<div className="p-4 text-sm text-muted-foreground">Loading HTML artifact</div>
) : (
<iframe
// Sandbox attributes are non-negotiable per the v1.14.x spec S5:
// no allow-same-origin → opaque origin → can't reach parent cookies
// or DOM. srcdoc (not src) means no URL exists to leak. JS runs
// (allow-scripts) but connect-src 'none' on the CSP inside the
// payload blocks fetch / WS / pixels.
srcDoc={htmlContent}
sandbox="allow-scripts allow-clipboard-write allow-downloads"
className="w-full h-full border-0"
title={state.title || 'HTML artifact'}
/>
)}
</div>
</div>
);
}

View File

@@ -0,0 +1,137 @@
// v1.14.x-html-artifact-panes: dedicated full-height Markdown viewer used
// when a user clicks "Open in pane" on an assistant message that has NO
// html_artifact part. Header carries Copy (raw source) + Download (server-
// materialised .md under <projectRoot>/.boocode/artifacts/) + close.
//
// Pane state is a reference only (chat_id + message_id + title); the markdown
// body is fetched on mount from GET /api/chats/:chat_id/messages by locating
// the matching message_id. This keeps sessions.workspace_panes jsonb small
// and the assistant message row remains the single source of truth.
import { useEffect, useState } from 'react';
import { Check, Copy, Download, X } from 'lucide-react';
import { toast } from 'sonner';
import { api } from '@/api/client';
import type { MarkdownArtifactState } from '@/api/types';
import { MarkdownRenderer } from './MarkdownRenderer';
interface Props {
chatId: string;
state: MarkdownArtifactState;
onClose: () => void;
}
export function MarkdownArtifactPane({ chatId, state, onClose }: Props) {
const [justCopied, setJustCopied] = useState(false);
const [downloading, setDownloading] = useState(false);
const [content, setContent] = useState<string | null>(null);
const [loadError, setLoadError] = useState<string | null>(null);
useEffect(() => {
let cancelled = false;
setContent(null);
setLoadError(null);
void (async () => {
try {
// No single-message GET endpoint exists; the chat-messages list is
// already cached server-side and the lookup is O(n) over a small
// window. Cheaper than adding a new route for one call site.
const messages = await api.chats.messages(chatId);
if (cancelled) return;
const msg = messages.find((m) => m.id === state.message_id);
if (!msg) {
setLoadError('Message not found');
return;
}
setContent(msg.content ?? '');
} catch (err) {
if (cancelled) return;
setLoadError(err instanceof Error ? err.message : 'failed to load message');
}
})();
return () => {
cancelled = true;
};
}, [chatId, state.message_id]);
async function copy() {
if (content === null) return;
try {
await navigator.clipboard.writeText(content);
setJustCopied(true);
setTimeout(() => setJustCopied(false), 1200);
} catch (err) {
toast.error(err instanceof Error ? err.message : 'copy failed');
}
}
async function download() {
if (downloading) return;
setDownloading(true);
try {
const { url, path } = await api.messages.downloadArtifact(
chatId,
state.message_id,
'md',
);
// Trigger browser download from the returned URL. The endpoint stamps
// Content-Disposition: attachment so the click lands as a save.
const a = document.createElement('a');
a.href = url;
a.rel = 'noopener';
a.click();
toast.success(`Saved to ${path}`);
} catch (err) {
toast.error(err instanceof Error ? err.message : 'download failed');
} finally {
setDownloading(false);
}
}
return (
<div className="flex flex-col h-full min-h-0">
<div className="flex items-center gap-2 border-b border-border bg-muted/30 px-2 py-1 shrink-0">
<span className="text-xs text-muted-foreground truncate flex-1" title={state.title}>
{state.title || 'Markdown artifact'}
</span>
<button
type="button"
onClick={() => void copy()}
disabled={content === null}
className="inline-flex items-center justify-center size-5 rounded text-muted-foreground hover:bg-muted hover:text-foreground disabled:opacity-40 max-md:min-h-[44px] max-md:min-w-[44px]"
aria-label="Copy markdown source"
title="Copy"
>
{justCopied ? <Check size={12} /> : <Copy size={12} />}
</button>
<button
type="button"
onClick={() => void download()}
disabled={downloading || content === null}
className="inline-flex items-center justify-center size-5 rounded text-muted-foreground hover:bg-muted hover:text-foreground disabled:opacity-40 max-md:min-h-[44px] max-md:min-w-[44px]"
aria-label="Download markdown"
title="Download"
>
<Download size={12} />
</button>
<button
type="button"
onClick={onClose}
className="inline-flex items-center justify-center size-5 rounded text-muted-foreground hover:bg-muted hover:text-foreground max-md:min-h-[44px] max-md:min-w-[44px]"
aria-label="Close artifact pane"
title="Close"
>
<X size={12} />
</button>
</div>
<div className="flex-1 min-h-0 overflow-auto px-4 py-3 text-sm">
{loadError ? (
<div className="text-destructive">Failed to load: {loadError}</div>
) : content === null ? (
<div className="text-muted-foreground">Loading</div>
) : (
<MarkdownRenderer content={content} />
)}
</div>
</div>
);
}

View File

@@ -0,0 +1,148 @@
// v1.14.x-html-artifact-panes: extracted from MessageBubble.tsx so both the
// in-chat bubble renderer and the MarkdownArtifactPane share the same Shiki +
// remark-gfm + path-linkifier pipeline. Behavior preserved byte-for-byte from
// the original MessageBubble.MarkdownBody helper (and its linkify helpers).
import { Children, cloneElement, isValidElement } from 'react';
import type { ReactElement, ReactNode } from 'react';
import Markdown from 'react-markdown';
import remarkGfm from 'remark-gfm';
import { CodeBlock } from './CodeBlock';
import { sessionEvents } from '@/hooks/sessionEvents';
// Match path-shaped substrings ending in `.ext`. Additionally require a `/`
// in the match to reduce false positives in prose (e.g. plain `foo.ts` won't
// match, but `src/foo.ts` will). False positives at the edges are accepted
// per Sam's design decision (2026-05-14).
const PATH_REGEX = /([a-zA-Z0-9._/-]+\.[a-zA-Z0-9]+)/g;
function isPathLike(s: string): boolean {
return s.includes('/');
}
function emitOpenFile(path: string): void {
sessionEvents.emit({ type: 'open_file_in_browser', path });
}
function linkifyPaths(text: string, keyPrefix: string): ReactNode {
const out: ReactNode[] = [];
let lastIdx = 0;
let idx = 0;
for (const match of text.matchAll(PATH_REGEX)) {
const matchedText = match[0];
const start = match.index ?? 0;
if (!isPathLike(matchedText)) continue;
if (start > lastIdx) out.push(text.slice(lastIdx, start));
out.push(
<button
key={`${keyPrefix}-${idx}`}
type="button"
onClick={() => emitOpenFile(matchedText)}
className="text-primary underline cursor-pointer hover:text-primary/80"
>
{matchedText}
</button>
);
lastIdx = start + matchedText.length;
idx += 1;
}
if (out.length === 0) return text;
if (lastIdx < text.length) out.push(text.slice(lastIdx));
return out;
}
function linkifyChildren(children: ReactNode, keyPrefix = 'l'): ReactNode {
const arr = Children.toArray(children);
return arr.map((child, i) => {
if (typeof child === 'string') {
return (
<span key={`${keyPrefix}-${i}`}>
{linkifyPaths(child, `${keyPrefix}-${i}`)}
</span>
);
}
if (isValidElement(child)) {
const el = child as ReactElement<{ children?: ReactNode }>;
if (el.type === 'code' || el.type === CodeBlock) return child;
const grandchildren = el.props.children;
if (grandchildren === undefined) return child;
return cloneElement(el, {
key: el.key ?? `linkified-${i}`,
children: linkifyChildren(grandchildren, `${keyPrefix}-${i}`),
});
}
return child;
});
}
const codeRenderer = (props: { children?: unknown; className?: string }) => {
const { children, className, ...rest } = props;
const text = String(children ?? '').replace(/\n$/, '');
const langMatch = /language-([\w-]+)/.exec(className ?? '');
const isBlock = !!langMatch || text.includes('\n');
if (isBlock) {
return <CodeBlock code={text} lang={langMatch?.[1]} />;
}
return (
<code
{...rest}
className="rounded bg-muted px-1 py-0.5 font-mono text-[0.85em]"
>
{children as React.ReactNode}
</code>
);
};
export function MarkdownRenderer({ content }: { content: string }) {
return (
<Markdown
remarkPlugins={[remarkGfm]}
components={{
pre: ({ children }) => <>{children}</>,
code: codeRenderer,
a: ({ children, href }) => (
<a
href={href}
target="_blank"
rel="noreferrer"
className="underline decoration-muted-foreground/40 underline-offset-2 hover:decoration-foreground"
>
{children}
</a>
),
ul: ({ children }) => (
<ul className="list-disc pl-5 space-y-1">{children}</ul>
),
ol: ({ children }) => (
<ol className="list-decimal pl-5 space-y-1">{children}</ol>
),
li: ({ children }) => <li>{linkifyChildren(children)}</li>,
p: ({ children }) => (
<p className="leading-relaxed">{linkifyChildren(children)}</p>
),
h1: ({ children }) => <h1 className="text-base font-semibold mt-2">{children}</h1>,
h2: ({ children }) => <h2 className="text-sm font-semibold mt-2">{children}</h2>,
h3: ({ children }) => <h3 className="text-sm font-semibold mt-1">{children}</h3>,
blockquote: ({ children }) => (
<blockquote className="border-l-2 border-border pl-3 text-muted-foreground">
{children}
</blockquote>
),
table: ({ children }) => (
<div className="overflow-x-auto">
<table className="border-collapse text-xs">{children}</table>
</div>
),
th: ({ children }) => (
<th className="border border-border px-2 py-1 text-left font-medium">{children}</th>
),
td: ({ children }) => (
<td className="border border-border px-2 py-1">
{linkifyChildren(children)}
</td>
),
}}
>
{content}
</Markdown>
);
}

View File

@@ -1,16 +1,14 @@
import { Children, cloneElement, isValidElement, useEffect, useState } from 'react';
import type { ReactElement, ReactNode } from 'react';
import Markdown from 'react-markdown';
import remarkGfm from 'remark-gfm';
import { ChevronDown, ChevronRight, Copy, RefreshCw, Check, Share2, RotateCw, GitFork, Trash2 } from 'lucide-react';
import { useEffect, useState } from 'react';
import type { ReactNode } from 'react';
import { ChevronDown, ChevronRight, Copy, RefreshCw, Check, Share2, RotateCw, GitFork, Trash2, PanelRightOpen } from 'lucide-react';
import { toast } from 'sonner';
import type { Chat, ErrorReason, Message } from '@/api/types';
import { api } from '@/api/client';
import { api, ApiError } from '@/api/client';
import { sessionEvents } from '@/hooks/sessionEvents';
import { sendToTerminal, terminalsRegistry, type TerminalRegistration } from '@/lib/events';
import { CapHitSentinel } from './CapHitSentinel';
import { DoomLoopSentinel } from './DoomLoopSentinel';
import { CodeBlock } from './CodeBlock';
import { MarkdownRenderer } from './MarkdownRenderer';
import { Button } from '@/components/ui/button';
import {
ContextMenu,
@@ -90,76 +88,20 @@ const ERROR_REASON_LABELS: Record<ErrorReason, string> = {
summary_after_cap_failed: 'Summary after tool budget hit failed',
};
// Match path-shaped substrings ending in `.ext`. Additionally require a `/`
// in the match to reduce false positives in prose (e.g. plain `foo.ts` won't
// match, but `src/foo.ts` will). False positives at the edges are accepted
// per Sam's design decision (2026-05-14).
const PATH_REGEX = /([a-zA-Z0-9._/-]+\.[a-zA-Z0-9]+)/g;
// v1.14.x-html-artifact-panes: MarkdownBody and its path-linkifier helpers
// moved to apps/web/src/components/MarkdownRenderer.tsx so the new artifact
// panes can render assistant content with the same Shiki + remark-gfm setup.
function isPathLike(s: string): boolean {
return s.includes('/');
}
function emitOpenFile(path: string): void {
sessionEvents.emit({ type: 'open_file_in_browser', path });
}
// Split a plain string into a flat array of strings and clickable button
// nodes for path-shaped substrings. If no matches, returns the original
// string verbatim (no array wrapping).
function linkifyPaths(text: string, keyPrefix: string): ReactNode {
const out: ReactNode[] = [];
let lastIdx = 0;
let idx = 0;
for (const match of text.matchAll(PATH_REGEX)) {
const matchedText = match[0];
const start = match.index ?? 0;
if (!isPathLike(matchedText)) continue;
if (start > lastIdx) out.push(text.slice(lastIdx, start));
out.push(
<button
key={`${keyPrefix}-${idx}`}
type="button"
onClick={() => emitOpenFile(matchedText)}
className="text-primary underline cursor-pointer hover:text-primary/80"
>
{matchedText}
</button>
);
lastIdx = start + matchedText.length;
idx += 1;
}
if (out.length === 0) return text;
if (lastIdx < text.length) out.push(text.slice(lastIdx));
return out;
}
// Walk react-markdown children, linkifying string text nodes. Children of
// <code> nodes (CodeBlock and inline code) are left untouched — the regex
// shouldn't run inside code spans.
function linkifyChildren(children: ReactNode, keyPrefix = 'l'): ReactNode {
const arr = Children.toArray(children);
return arr.map((child, i) => {
if (typeof child === 'string') {
return (
<span key={`${keyPrefix}-${i}`}>
{linkifyPaths(child, `${keyPrefix}-${i}`)}
</span>
);
}
if (isValidElement(child)) {
const el = child as ReactElement<{ children?: ReactNode }>;
// Skip inline/block code — paths in code spans aren't link targets.
if (el.type === 'code' || el.type === CodeBlock) return child;
const grandchildren = el.props.children;
if (grandchildren === undefined) return child;
return cloneElement(el, {
key: el.key ?? `linkified-${i}`,
children: linkifyChildren(grandchildren, `${keyPrefix}-${i}`),
});
}
return child;
});
// Pane-header title derivation for a markdown artifact. Order matches the
// server slug logic in services/artifacts.ts: first `# ` heading → first 6
// words of the body → 'Markdown artifact'. Truncated to keep the pane header
// readable.
function deriveMarkdownTitle(content: string): string {
const headingMatch = content.match(/^\s*#\s+(.+?)\s*$/m);
if (headingMatch && headingMatch[1]) return headingMatch[1].slice(0, 80);
const words = content.trim().split(/\s+/).slice(0, 6).join(' ');
if (words) return words.slice(0, 80);
return 'Markdown artifact';
}
interface Props {
@@ -170,80 +112,6 @@ interface Props {
capHitInfo?: { position: number; isLatest: boolean };
}
function MarkdownBody({ content }: { content: string }) {
return (
<Markdown
remarkPlugins={[remarkGfm]}
components={{
pre: ({ children }) => <>{children}</>,
code: (props) => {
const { children, className, ...rest } = props as {
children?: unknown;
className?: string;
};
const text = String(children ?? '').replace(/\n$/, '');
const langMatch = /language-([\w-]+)/.exec(className ?? '');
const isBlock = !!langMatch || text.includes('\n');
if (isBlock) {
return <CodeBlock code={text} lang={langMatch?.[1]} />;
}
return (
<code
{...rest}
className="rounded bg-muted px-1 py-0.5 font-mono text-[0.85em]"
>
{children as React.ReactNode}
</code>
);
},
a: ({ children, href }) => (
<a
href={href}
target="_blank"
rel="noreferrer"
className="underline decoration-muted-foreground/40 underline-offset-2 hover:decoration-foreground"
>
{children}
</a>
),
ul: ({ children }) => (
<ul className="list-disc pl-5 space-y-1">{children}</ul>
),
ol: ({ children }) => (
<ol className="list-decimal pl-5 space-y-1">{children}</ol>
),
li: ({ children }) => <li>{linkifyChildren(children)}</li>,
p: ({ children }) => (
<p className="leading-relaxed">{linkifyChildren(children)}</p>
),
h1: ({ children }) => <h1 className="text-base font-semibold mt-2">{children}</h1>,
h2: ({ children }) => <h2 className="text-sm font-semibold mt-2">{children}</h2>,
h3: ({ children }) => <h3 className="text-sm font-semibold mt-1">{children}</h3>,
blockquote: ({ children }) => (
<blockquote className="border-l-2 border-border pl-3 text-muted-foreground">
{children}
</blockquote>
),
table: ({ children }) => (
<div className="overflow-x-auto">
<table className="border-collapse text-xs">{children}</table>
</div>
),
th: ({ children }) => (
<th className="border border-border px-2 py-1 text-left font-medium">{children}</th>
),
td: ({ children }) => (
<td className="border border-border px-2 py-1">
{linkifyChildren(children)}
</td>
),
}}
>
{content}
</Markdown>
);
}
function StatsLine({ message }: { message: Message }) {
const tokens = message.tokens_used;
if (typeof tokens !== 'number' || tokens <= 0) return null;
@@ -337,6 +205,54 @@ function ActionRow({
const canRegen = isAssistant && message.status !== 'streaming';
const canFork = message.status === 'complete';
const canDelete = message.status !== 'streaming';
const [openingPane, setOpeningPane] = useState(false);
// v1.14.x-html-artifact-panes: probe for an html_artifact part. If present,
// open the HTML pane variant; otherwise fall back to the markdown variant.
// Title derivation for markdown: first `# ` heading → first 6 words of the
// body → 'Markdown artifact' (mirrors the slug logic in
// services/artifacts.ts).
async function openInPane() {
if (openingPane || message.status === 'streaming') return;
setOpeningPane(true);
try {
try {
const payload = await api.messages.getHtmlArtifact(
message.chat_id,
message.id,
);
sessionEvents.emit({
type: 'open_html_artifact_pane',
state: {
chat_id: message.chat_id,
message_id: message.id,
title: payload.title,
},
});
return;
} catch (err) {
// 404 (no html_artifact part) is the expected fall-through path —
// markdown variant opens below. Any other error (network, 500) is
// a real failure; toast and bail rather than masquerading as markdown.
const status = err instanceof ApiError ? err.status : null;
if (status !== 404) {
toast.error(err instanceof Error ? err.message : 'open in pane failed');
return;
}
}
const title = deriveMarkdownTitle(message.content);
sessionEvents.emit({
type: 'open_markdown_artifact_pane',
state: {
chat_id: message.chat_id,
message_id: message.id,
title,
},
});
} finally {
setOpeningPane(false);
}
}
return (
<>
@@ -350,6 +266,18 @@ function ActionRow({
>
{justCopied ? <Check className="size-3" /> : <Copy className="size-3" />}
</button>
{isAssistant && (
<button
type="button"
onClick={() => void openInPane()}
disabled={openingPane || message.status === 'streaming'}
className="inline-flex items-center justify-center size-6 rounded text-muted-foreground hover:bg-muted hover:text-foreground disabled:opacity-40 disabled:cursor-not-allowed max-md:min-h-[44px] max-md:min-w-[44px]"
aria-label="Open in pane"
title="Open in pane"
>
<PanelRightOpen className="size-3" />
</button>
)}
{isAssistant && (
<button
type="button"
@@ -588,7 +516,7 @@ function SummaryCard({ message }: { message: Message }) {
</div>
{expanded && (
<div className="px-3 pb-3 text-xs leading-relaxed border-t pt-2">
<MarkdownBody content={message.content} />
<MarkdownRenderer content={message.content} />
</div>
)}
</div>
@@ -651,7 +579,9 @@ export function MessageBubble({ message, sessionChats, capHitInfo }: Props) {
const isStreaming = message.status === 'streaming';
const failed = message.status === 'failed';
const hasContent = message.content.length > 0;
// v1.13.7: match the MessageList.flatten trim guard so a whitespace-only
// assistant turn doesn't render an empty bubble + dangling ActionRow.
const hasContent = message.content.trim().length > 0;
// v1.8.2: if metadata stamps an error reason, surface it inline under the
// generic "message failed" line. Keeps the user's eye where it already is
// rather than introducing a separate banner.
@@ -665,7 +595,7 @@ export function MessageBubble({ message, sessionChats, capHitInfo }: Props) {
{(hasContent || isStreaming) && (
<SendToTerminalMenu>
<div className="max-w-[90%] text-sm leading-relaxed space-y-2 break-words min-w-0">
{hasContent ? <MarkdownBody content={message.content} /> : null}
{hasContent ? <MarkdownRenderer content={message.content} /> : null}
{isStreaming && (
<span className="inline-block w-1.5 h-3.5 align-baseline bg-muted-foreground/60 animate-pulse" />
)}

View File

@@ -4,6 +4,7 @@ import { MessageBubble } from './MessageBubble';
import { ToolCallGroup } from './ToolCallGroup';
import { ToolCallLine, type ToolRun } from './ToolCallLine';
import { AskUserInputCard } from './AskUserInputCard';
import { RequestReadAccessCard } from './RequestReadAccessCard';
interface Props {
messages: Message[];
@@ -45,7 +46,12 @@ function flatten(messages: Message[]): RenderItem[] {
continue;
}
const hasToolCalls = m.tool_calls != null && m.tool_calls.length > 0;
const hasText = m.content.length > 0;
// v1.13.7: trim before checking. AI SDK v6 streaming occasionally emits a
// leading "\n" text-delta on tool-call-only turns, which used to flow into
// messages.content with length=1 and render an empty bubble + ActionRow
// between each tool call. Whitespace-only content has no visible payload,
// so treat it as no-content.
const hasText = m.content.trim().length > 0;
if (m.role === 'assistant' && hasToolCalls) {
if (hasText || m.status === 'streaming') {
items.push({ kind: 'message', message: m });
@@ -80,7 +86,9 @@ function group(items: RenderItem[]): RenderItem[] {
continue;
}
const name = item.run.call.name;
if (name === 'ask_user_input') {
if (name === 'ask_user_input' || name === 'request_read_access') {
// v1.13.17: same rationale as ask_user_input — grouping would collapse
// the interactive pause card into a non-actionable ToolCallLine.
out.push(item);
i += 1;
continue;
@@ -176,6 +184,16 @@ export function MessageList({ messages, sessionChats }: Props) {
/>
);
}
if (item.run.call.name === 'request_read_access') {
return (
<RequestReadAccessCard
key={item.key}
toolCall={item.run.call}
toolResult={item.run.result}
chatId={item.chatId}
/>
);
}
return <ToolCallLine key={item.key} run={item.run} />;
}
return <ToolCallGroup key={item.key} runs={item.runs} />;

View File

@@ -0,0 +1,193 @@
import { useState } from 'react';
import { Check, FolderOpen, ShieldOff } from 'lucide-react';
import { toast } from 'sonner';
import { api } from '@/api/client';
import { Button } from '@/components/ui/button';
import type { ToolCall, ToolResult } from '@/api/types';
// v1.13.17-cross-repo-reads. Renders an inline allow/deny picker for a
// paused request_read_access tool call. Mirrors AskUserInputCard's pending
// vs answered render dance:
// - Pending: server pre-stamps a sentinel tool_result with output=null.
// The card shows path + reason and lets the user pick Allow or Deny.
// - Answered: the eventual WS tool_result frame carries the actual
// decision string ("granted: <root>" or "denied" or "denied: <reason>").
// The card flips to a read-only summary line.
//
// Tool name discrimination lives in MessageList.flatten/group — anything
// with tc.name === 'request_read_access' bypasses grouping and renders this
// card directly.
interface Props {
toolCall: ToolCall;
toolResult: ToolResult | null;
chatId: string;
}
interface ParsedArgs {
path: string;
reason: string;
}
function parseArgs(raw: unknown): ParsedArgs | null {
if (!raw || typeof raw !== 'object') return null;
const obj = raw as { path?: unknown; reason?: unknown };
if (typeof obj.path !== 'string' || obj.path.length === 0) return null;
if (typeof obj.reason !== 'string' || obj.reason.length === 0) return null;
return { path: obj.path, reason: obj.reason };
}
function decisionVariant(output: unknown): 'granted' | 'denied' | 'unknown' {
if (typeof output !== 'string') return 'unknown';
if (output.startsWith('granted:')) return 'granted';
if (output === 'denied' || output.startsWith('denied:')) return 'denied';
return 'unknown';
}
export function RequestReadAccessCard({ toolCall, toolResult, chatId }: Props) {
const args = parseArgs(toolCall.args);
if (!args) {
return (
<div className="rounded border border-destructive/40 bg-destructive/10 text-xs px-3 py-2 text-destructive">
request_read_access: malformed tool args
</div>
);
}
// Non-null output means the WS tool_result frame arrived (or the row was
// re-fetched from history).
const answered = toolResult && toolResult.output !== null;
if (answered) {
return <AnsweredView args={args} output={toolResult!.output} />;
}
return <PendingView args={args} toolCallId={toolCall.id} chatId={chatId} />;
}
function PendingView({
args,
toolCallId,
chatId,
}: {
args: ParsedArgs;
toolCallId: string;
chatId: string;
}) {
const [submitting, setSubmitting] = useState<'allow' | 'deny' | null>(null);
async function decide(decision: 'allow' | 'deny') {
if (submitting) return;
setSubmitting(decision);
try {
await api.chats.grantReadAccess(chatId, toolCallId, decision);
// Card stays mounted; the incoming WS tool_result frame swaps it to
// AnsweredView via the parent prop change.
} catch (err) {
toast.error(err instanceof Error ? err.message : 'request failed');
setSubmitting(null);
}
}
return (
<div className="rounded-lg border border-amber-500/40 bg-amber-500/5 text-sm">
<div className="px-4 py-3 space-y-2">
<div className="flex items-center gap-2 text-xs uppercase tracking-wide text-amber-700 dark:text-amber-300">
<ShieldOff className="size-3.5" />
<span>Read-access request</span>
</div>
<div className="space-y-1.5">
<div className="text-[10px] uppercase tracking-wide text-muted-foreground/70">Path</div>
<div className="font-mono text-xs break-all rounded bg-background/60 border px-2 py-1">
{args.path}
</div>
</div>
<div className="space-y-1.5">
<div className="text-[10px] uppercase tracking-wide text-muted-foreground/70">Reason</div>
<div className="text-sm leading-snug whitespace-pre-wrap">{args.reason}</div>
</div>
<div className="text-[11px] text-muted-foreground pt-1">
Allow grants the agent read access to the matching repository root for
the rest of this session. Revoke any time from the session settings.
</div>
</div>
<div className="flex justify-end gap-2 border-t border-amber-500/20 px-4 py-2">
<Button
type="button"
size="sm"
variant="outline"
disabled={submitting !== null}
onClick={() => void decide('deny')}
>
{submitting === 'deny' ? 'Denying…' : 'Deny'}
</Button>
<Button
type="button"
size="sm"
disabled={submitting !== null}
onClick={() => void decide('allow')}
>
{submitting === 'allow' ? 'Allowing…' : 'Allow'}
</Button>
</div>
</div>
);
}
function AnsweredView({ args, output }: { args: ParsedArgs; output: unknown }) {
const variant = decisionVariant(output);
const text = typeof output === 'string' ? output : 'unknown';
return (
<div
className={
variant === 'granted'
? 'rounded-lg border border-emerald-500/40 bg-emerald-500/5 text-sm'
: variant === 'denied'
? 'rounded-lg border bg-muted/20 text-sm'
: 'rounded-lg border border-destructive/40 bg-destructive/5 text-sm'
}
>
<div className="px-4 py-3 space-y-2">
<div className="flex items-center gap-2 text-xs uppercase tracking-wide">
{variant === 'granted' ? (
<>
<Check className="size-3.5 text-emerald-600" />
<span className="text-emerald-700 dark:text-emerald-300">Read access granted</span>
</>
) : variant === 'denied' ? (
<>
<ShieldOff className="size-3.5 text-muted-foreground" />
<span className="text-muted-foreground">Read access denied</span>
</>
) : (
<>
<ShieldOff className="size-3.5 text-destructive" />
<span className="text-destructive">Read access request unknown result</span>
</>
)}
</div>
<div className="space-y-1.5">
<div className="text-[10px] uppercase tracking-wide text-muted-foreground/70">Path</div>
<div className="font-mono text-xs break-all rounded bg-background/60 border px-2 py-1">
{args.path}
</div>
</div>
{variant === 'granted' && (
<div className="space-y-1.5">
<div className="text-[10px] uppercase tracking-wide text-muted-foreground/70">Granted root</div>
<div className="font-mono text-xs break-all rounded bg-background/60 border px-2 py-1 flex items-center gap-1.5">
<FolderOpen className="size-3 shrink-0 text-muted-foreground" />
<span>{text.replace(/^granted:\s*/, '')}</span>
</div>
</div>
)}
{variant === 'denied' && text !== 'denied' && (
<div className="text-[11px] text-muted-foreground">
{text.replace(/^denied:\s*/, '')}
</div>
)}
</div>
</div>
);
}

View File

@@ -8,6 +8,8 @@ import { terminalsRegistry } from '@/lib/events';
import { ChatPane } from '@/components/panes/ChatPane';
import { SettingsPane } from '@/components/panes/SettingsPane';
import { TerminalPane } from '@/components/panes/TerminalPane';
import { MarkdownArtifactPane } from '@/components/MarkdownArtifactPane';
import { HtmlArtifactPane } from '@/components/HtmlArtifactPane';
import { ChatTabBar } from '@/components/ChatTabBar';
import { SessionLandingPage } from '@/components/SessionLandingPage';
import {
@@ -182,6 +184,7 @@ export function Workspace({
{panes.map((pane, idx) => {
const isSettings = pane.kind === 'settings';
const isTerminal = pane.kind === 'terminal';
const isArtifact = pane.kind === 'markdown_artifact' || pane.kind === 'html_artifact';
// v1.9: when maximized, hide every pane except the settings one.
// display:none keeps the React tree mounted so streams / drafts
// survive the toggle without re-mount cost.
@@ -195,7 +198,7 @@ export function Workspace({
}
// Terminal panes own their tab strip (no chats, no ChatTabBar) and
// are not drag-reorderable for now — keeps the layout grid simple.
const isChromeless = isSettings || isTerminal;
const isChromeless = isSettings || isTerminal || isArtifact;
return (
<div
key={pane.id}
@@ -318,6 +321,18 @@ export function Workspace({
label={terminalLabels.get(pane.id) ?? 'Terminal'}
active={idx === activePaneIdx}
/>
) : pane.kind === 'markdown_artifact' && pane.markdown_artifact_state ? (
<MarkdownArtifactPane
chatId={pane.markdown_artifact_state.chat_id}
state={pane.markdown_artifact_state}
onClose={() => removePane(idx)}
/>
) : pane.kind === 'html_artifact' && pane.html_artifact_state ? (
<HtmlArtifactPane
chatId={pane.html_artifact_state.chat_id}
state={pane.html_artifact_state}
onClose={() => removePane(idx)}
/>
) : pane.kind === 'chat' && pane.chatId ? (
<ChatPane
sessionId={sessionId}

View File

@@ -1,17 +1,12 @@
import { useCallback, useEffect, useRef, useState } from 'react';
import { ChevronDown, Square, X } from 'lucide-react';
import { Pencil, Send, Square, X } from 'lucide-react';
import { toast } from 'sonner';
import { api } from '@/api/client';
import { useSessionStream } from '@/hooks/useSessionStream';
import { MessageList } from '@/components/MessageList';
import { ChatInput } from '@/components/ChatInput';
import { StaleStreamBanner } from '@/components/StaleStreamBanner';
import {
DropdownMenu,
DropdownMenuContent,
DropdownMenuItem,
DropdownMenuTrigger,
} from '@/components/ui/dropdown-menu';
import { sendToChat } from '@/lib/events';
interface Props {
sessionId: string;
@@ -186,6 +181,16 @@ export function ChatPane({ sessionId, chatId, projectId, agentId, onAgentChange,
setQueue((prev) => prev.filter((_, i) => i !== idx));
}
// v1.13.12: edit a queued message — pop it off the queue and push its text
// into ChatInput via sendToChat. ChatInput appends (or sets, if empty) and
// focuses; user re-sends, which re-queues if streaming is still active.
function editQueued(idx: number) {
const msg = queue[idx];
if (!msg) return;
setQueue((prev) => prev.filter((_, i) => i !== idx));
sendToChat.emit({ chat_id: chatId, text: msg });
}
async function forceSendQueued(idx: number) {
const msg = queue[idx];
if (!msg) return;
@@ -210,30 +215,30 @@ export function ChatPane({ sessionId, chatId, projectId, agentId, onAgentChange,
<div key={i} className="flex items-center gap-2 text-xs text-muted-foreground bg-muted/30 rounded px-2 py-1">
<span className="font-medium shrink-0">Queued:</span>
<span className="truncate flex-1">{msg}</span>
<DropdownMenu>
<DropdownMenuTrigger asChild>
<button
type="button"
className="inline-flex items-center justify-center p-0.5 hover:bg-muted rounded shrink-0 max-md:min-h-[44px] max-md:min-w-[44px]"
aria-label="Queued message options"
>
<ChevronDown size={12} />
</button>
</DropdownMenuTrigger>
<DropdownMenuContent align="end">
<DropdownMenuItem onSelect={() => { /* default: queued, nothing to do */ }}>
Send when done
</DropdownMenuItem>
<DropdownMenuItem onSelect={() => void forceSendQueued(i)}>
Force send now
</DropdownMenuItem>
</DropdownMenuContent>
</DropdownMenu>
<button
type="button"
onClick={() => editQueued(i)}
className="inline-flex items-center justify-center p-0.5 hover:bg-muted rounded shrink-0 max-md:min-h-[44px] max-md:min-w-[44px]"
aria-label="Edit queued message"
title="Edit"
>
<Pencil size={12} />
</button>
<button
type="button"
onClick={() => void forceSendQueued(i)}
className="inline-flex items-center justify-center p-0.5 hover:bg-muted rounded shrink-0 max-md:min-h-[44px] max-md:min-w-[44px]"
aria-label="Force send queued message now"
title="Force send now"
>
<Send size={12} />
</button>
<button
type="button"
onClick={() => removeQueued(i)}
className="inline-flex items-center justify-center p-0.5 hover:bg-muted rounded shrink-0 max-md:min-h-[44px] max-md:min-w-[44px]"
aria-label="Cancel queued message"
title="Cancel"
>
<X size={12} />
</button>

View File

@@ -1,5 +1,5 @@
import { useEffect, useState } from 'react';
import { Archive, Maximize2, Minimize2, X } from 'lucide-react';
import { Archive, FolderOpen, Maximize2, Minimize2, Trash2, X } from 'lucide-react';
import { toast } from 'sonner';
import { api } from '@/api/client';
import type { Project, Session } from '@/api/types';
@@ -269,6 +269,8 @@ function SessionSection({ session, project }: { session: Session; project: Proje
</p>
</div>
<AllowedReadPathsSection session={session} />
<div className="space-y-1.5">
<div className="flex items-center justify-between gap-3">
<label className="text-xs font-medium uppercase tracking-wide text-muted-foreground">
@@ -337,6 +339,76 @@ function SessionSection({ session, project }: { session: Session; project: Proje
);
}
// v1.13.17-cross-repo-reads: revoke UI for session.allowed_read_paths.
// Append happens through the inline request_read_access pause flow; this
// section only shrinks the list. PATCH /api/sessions/:id replaces the
// whole array, so we send the original list minus the deleted entry.
function AllowedReadPathsSection({ session }: { session: Session }) {
const [paths, setPaths] = useState<string[]>(session.allowed_read_paths);
const [pendingDelete, setPendingDelete] = useState<string | null>(null);
// Re-sync on session prop change (e.g. WS session_updated after a new
// grant lands). Without this, a grant approved in this same chat wouldn't
// appear in the list until the user closes and reopens settings.
useEffect(() => {
setPaths(session.allowed_read_paths);
}, [session.id, session.allowed_read_paths]);
async function remove(path: string) {
if (pendingDelete) return;
setPendingDelete(path);
const next = paths.filter((p) => p !== path);
try {
const updated = await api.sessions.update(session.id, { allowed_read_paths: next });
setPaths(updated.allowed_read_paths);
toast.success('Grant revoked');
} catch (err) {
toast.error(err instanceof Error ? err.message : 'failed to revoke');
} finally {
setPendingDelete(null);
}
}
return (
<div className="space-y-1.5">
<label className="text-xs font-medium uppercase tracking-wide text-muted-foreground">
Cross-repo read grants
</label>
{paths.length === 0 ? (
<p className="text-xs text-muted-foreground italic">
The agent has no access outside this project. Grants are created when
the agent asks for them inline.
</p>
) : (
<ul className="space-y-1">
{paths.map((p) => (
<li
key={p}
className="flex items-center gap-2 rounded border bg-background/60 px-2 py-1.5"
>
<FolderOpen className="size-3.5 shrink-0 text-muted-foreground" />
<span className="font-mono text-xs flex-1 min-w-0 break-all">{p}</span>
<button
type="button"
onClick={() => void remove(p)}
disabled={pendingDelete !== null}
aria-label={`Revoke ${p}`}
title="Revoke"
className="inline-flex items-center justify-center size-7 rounded text-muted-foreground hover:bg-muted hover:text-destructive disabled:opacity-40 disabled:cursor-not-allowed max-md:min-h-[44px] max-md:min-w-[44px]"
>
<Trash2 className="size-3.5" />
</button>
</li>
))}
</ul>
)}
<p className="text-xs text-muted-foreground">
Grants are session-scoped. Archiving the session clears them.
</p>
</div>
);
}
function ProjectSection({ project }: { project: Project }) {
const [name, setName] = useState(project.name);
const [defaultPrompt, setDefaultPrompt] = useState(project.default_system_prompt);

View File

@@ -2,7 +2,14 @@
// across hooks (e.g. AI rename arriving via WS in the session view needs to
// also refresh the sidebar's session list).
import type { Chat, ErrorReason, Project, Session } from '@/api/types';
import type {
Chat,
ErrorReason,
HtmlArtifactState,
MarkdownArtifactState,
Project,
Session,
} from '@/api/types';
import type { Attachment } from '@/lib/attachments';
export interface SessionRenamedEvent {
@@ -68,6 +75,19 @@ export interface OpenChatInActivePaneEvent {
chat_id: string;
}
// v1.14.x-html-artifact-panes: ActionRow's "Open in pane" button emits one of
// these; useWorkspacePanes subscribes and inserts the corresponding artifact
// pane (or focuses an existing one keyed by message_id).
export interface OpenMarkdownArtifactPaneEvent {
type: 'open_markdown_artifact_pane';
state: MarkdownArtifactState;
}
export interface OpenHtmlArtifactPaneEvent {
type: 'open_html_artifact_pane';
state: HtmlArtifactState;
}
// Client-side event fired by the sidebar Settings button when a session is
// currently mounted. Session.tsx subscribes and calls
// panesHook.toggleSettingsPane() (open on first click, close on second).
@@ -154,6 +174,8 @@ export type SessionEvent =
| OpenFileInBrowserEvent
| AttachChatFileEvent
| OpenChatInActivePaneEvent
| OpenMarkdownArtifactPaneEvent
| OpenHtmlArtifactPaneEvent
| OpenSettingsPaneEvent
| SessionArchivedEvent
| ChatCreatedEvent

View File

@@ -1,6 +1,7 @@
import { useEffect, useRef, useState } from 'react';
import { toast } from 'sonner';
import type { Message, WsFrame } from '@/api/types';
import { WsFrameSchema } from '@/api/ws-frames';
import { api } from '@/api/client';
import { sessionEvents } from './sessionEvents';
import { recordUsage } from './useChatThroughput';
@@ -216,8 +217,28 @@ export function useSessionStream(sessionId: string | undefined) {
setState((s) => ({ ...s, connected: true, error: null }));
};
ws.onmessage = (ev) => {
// v1.13.11-a: Zod-validate every inbound frame. Fail-closed — invalid
// frames are logged and dropped. WsFrameSchema is the runtime guard;
// the hand-maintained WsFrame type stays as the narrowed dev-time
// shape (Zod uses OpaqueObject for nested types like Message[]). One
// cast bridges the two.
let raw: unknown;
try {
const frame = JSON.parse(typeof ev.data === 'string' ? ev.data : '') as WsFrame;
raw = JSON.parse(typeof ev.data === 'string' ? ev.data : '');
} catch (err) {
console.warn('bad ws frame (parse)', err);
return;
}
const validated = WsFrameSchema.safeParse(raw);
if (!validated.success) {
console.error('ws-frame-validation-failed (session channel)', {
frame_type: (raw as { type?: unknown })?.type,
errors: validated.error.flatten(),
});
return;
}
try {
const frame = validated.data as unknown as WsFrame;
// v1.11: on a compaction completion, re-fetch the message list so
// the new summary row + the cohort of compacted_at-stamped older
// rows render correctly. We dispatch the fresh list as a synthetic

View File

@@ -154,6 +154,11 @@ function applyEvent(prev: SidebarResponse, event: import('./sessionEvents').Sess
case 'open_chat_in_active_pane':
// Consumed by Workspace; sidebar has no business with pane state.
return prev;
case 'open_markdown_artifact_pane':
case 'open_html_artifact_pane':
// v1.14.x-html-artifact-panes: consumed by useWorkspacePanes; sidebar
// has no business with pane state.
return prev;
case 'open_settings_pane':
// Consumed by Session.tsx (calls toggleSettingsPane on its panesHook).
// Sidebar data is untouched.

View File

@@ -1,4 +1,5 @@
import { useEffect } from 'react';
import { WsFrameSchema } from '@/api/ws-frames';
import { sessionEvents } from './sessionEvents';
import { createWsReconnectToast } from './wsReconnectToast';
@@ -38,14 +39,33 @@ export function useUserEvents(): void {
};
ws.onmessage = (ev) => {
// v1.13.11-a: Zod-validate every inbound frame. Fail-closed — invalid
// frames are logged and dropped instead of dispatched onto the
// sessionEvents bus where a stale or wrong shape would silently
// corrupt sidebar / chat state.
let raw: unknown;
try {
const parsed: unknown = JSON.parse(ev.data);
if (parsed && typeof (parsed as { type?: unknown }).type === 'string') {
sessionEvents.emit(parsed as import('./sessionEvents').SessionEvent);
}
raw = JSON.parse(ev.data);
} catch (err) {
console.warn('useUserEvents: failed to parse frame', err);
return;
}
const validated = WsFrameSchema.safeParse(raw);
if (!validated.success) {
console.error('ws-frame-validation-failed (user channel)', {
frame_type: (raw as { type?: unknown })?.type,
errors: validated.error.flatten(),
});
return;
}
// Bridge cast: Zod's union is broader than SessionEvent (it includes
// per-session-channel frames too, which never arrive on the user
// channel). sessionEvents.emit only dispatches frames whose type
// appears in SessionEvent; the narrowing happens via the existing
// useSidebar.ts applyEvent switch.
sessionEvents.emit(
validated.data as unknown as import('./sessionEvents').SessionEvent,
);
};
ws.onclose = () => {

View File

@@ -2,7 +2,11 @@ import { useCallback, useEffect, useRef, useState } from 'react';
import type { DragEvent } from 'react';
import { toast } from 'sonner';
import { api } from '@/api/client';
import type { WorkspacePane } from '@/api/types';
import type {
HtmlArtifactState,
MarkdownArtifactState,
WorkspacePane,
} from '@/api/types';
import { setActivePaneInfo, clearActivePane } from '@/hooks/useActivePane';
import { sessionEvents } from '@/hooks/sessionEvents';
@@ -43,6 +47,28 @@ function settingsPane(): WorkspacePane {
return { id: generateId(), kind: 'settings', chatIds: [], activeChatIdx: -1 };
}
// v1.14.x-html-artifact-panes: artifact pane factories. Payload travels with
// the pane row so the sessions.workspace_panes jsonb survives reload.
function markdownArtifactPane(state: MarkdownArtifactState): WorkspacePane {
return {
id: generateId(),
kind: 'markdown_artifact',
chatIds: [],
activeChatIdx: -1,
markdown_artifact_state: state,
};
}
function htmlArtifactPane(state: HtmlArtifactState): WorkspacePane {
return {
id: generateId(),
kind: 'html_artifact',
chatIds: [],
activeChatIdx: -1,
html_artifact_state: state,
};
}
// v1.9: settings panes are ephemeral. Filter them out before persisting so a
// page reload always returns to a clean workspace; the user re-opens via the
// sidebar Settings button when needed.
@@ -169,6 +195,50 @@ export function useWorkspacePanes(sessionId: string): UseWorkspacePanesResult {
});
}, [sessionId]);
// v1.14.x-html-artifact-panes: ActionRow's "Open in pane" emits one of
// these per click. If a pane already exists for the same message_id, focus
// it instead of stacking a duplicate. Otherwise append (capped at MAX_PANES;
// settings panes don't count, matching addSplitPane's rule).
useEffect(() => {
return sessionEvents.subscribe((ev) => {
if (
ev.type !== 'open_markdown_artifact_pane' &&
ev.type !== 'open_html_artifact_pane'
) {
return;
}
setPanes((prev) => {
const targetKind: WorkspacePane['kind'] =
ev.type === 'open_html_artifact_pane' ? 'html_artifact' : 'markdown_artifact';
const messageId = ev.state.message_id;
const existingIdx = prev.findIndex((p) =>
p.kind === 'markdown_artifact'
? p.markdown_artifact_state?.message_id === messageId
: p.kind === 'html_artifact'
? p.html_artifact_state?.message_id === messageId
: false,
);
if (existingIdx >= 0) {
setActivePaneIdx(existingIdx);
return prev;
}
if (nonSettingsCount(prev) >= MAX_PANES) {
toast.error(`Maximum ${MAX_PANES} panes`);
return prev;
}
const newPane =
ev.type === 'open_html_artifact_pane'
? htmlArtifactPane(ev.state)
: markdownArtifactPane(ev.state);
// Defensive: assert kind matches for the discriminated union.
if (newPane.kind !== targetKind) return prev;
const next = [...prev, newPane];
setActivePaneIdx(next.length - 1);
return next;
});
});
}, []);
// v1.12.1: debounced PATCH on every change. Settings panes are stripped
// before saving (ephemeral per v1.9).
useEffect(() => {

View File

@@ -1,20 +1,167 @@
# BooCode — External Code Review & Lift Inventory
Last updated: 2026-05-20
Last updated: 2026-05-22
This document tracks every open source repo BooCode references or lifts code from. Pin this so we don't lose attribution and don't re-evaluate the same projects twice.
BooCode is personal/single-user — license compatibility is non-blocking, but the License column is recorded so we don't accidentally inherit an obligation if BooCode ever goes public.
> **Companion doc:** `boocode_roadmap.md` is the canonical source for shipping state, version ordering, and what's planned vs. shipped. This document is the canonical source for *why* each external repo earned its row. Reconcile shipping state via the roadmap when in doubt.
>
> **Shipped reality as of 2026-05-22** (per roadmap): v1.13.1 (`ac1a71f`), v1.13.3 (`a08d809`), v1.13.4 (`ec8593c`), v1.13.5 (`f8fc5db`), and v1.13.6 (`81d837c`) tagged. AI SDK v6 migration done. `message_parts` table + `messages_with_parts` view live with dual-write. `experimental_repairToolCall` wired. Alpha tool ordering shipped. Two-tier compaction prune + truncate.ts opaque-id retrieval shipped. v1.13.6 closed the Q3 reasoning-render gap in compaction (latent regression from v1.13.1-C). **v1.13.7 stability bundle** (`includeUsage:true` for usage capture, trim guards against `\n` content artifacts, payload filter for trailing empty/failed assistants, `BUDGET_NO_AGENT 15→30`) — fixes a v1.13.1-A latent regression where `result.usage` came back empty. v1.13.2 (legacy-column drop) **deferred behind v1.13.8v1.13.12** as rollback insurance. v1.13.x cleanup line order is locked and **must not be folded**: v1.13.8 → v1.13.9 → v1.13.10 → v1.13.11 → v1.13.12 → v1.13.2. If anything in this catalog reads "planned" for a v1.11.xv1.13.6 lift, check the lift catalog table at the bottom for the corrected status.
-----
## Paseo-equivalent dispatcher inside BooCode (2026-05-22 strategic pivot)
Sam wants BooCode to function like Paseo without using Paseo itself. **Paseo (getpaseo/paseo) is AGPL-3.0**, which is incompatible with BooCode's MIT licensing and BooCode's network-served deployment at `code.indifferentketchup.com`. Lift the architecture and design patterns (not copyrightable) without lifting any code. Build inside BooCode's existing Fastify + TypeScript + PostgreSQL + React stack.
### Locked architecture decisions (2026-05-22, Sam confirmed)
1. **Monorepo with three apps, not three repos.** `/opt/boocode/apps/`:
- `apps/web/` — existing React SPA (the current chat UI).
- `apps/server/` — existing Fastify backend (the daemon).
- **`apps/chat/`** — BooChat surface (read-only inference loop, current `9500`, the live thing at `code.indifferentketchup.com`).
- **`apps/coder/`** — BooCoder surface (write-tool inference loop + external-CLI dispatch, port `9502`, `coder.indifferentketchup.com`, planned for v2.0).
- **`apps/booterm/`** — BooTerm surface (PTY/terminal pane, **live since May 2026, port `9501`**). Node 20 Alpine + node-pty + tmux + xterm.js. Tmux session per pane (`bc-<uuid>`), SSH-out works (image includes `openssh-client` + `gosu`). `/api/term/health` shares the existing `boocode_db`. Built as part of Batch 10. Confirmed working as of 2026-05-19.
- All three share the server package, the auth gate, the project registry, the task table, and the worktree manager.
1. **Single shared database.** Rename current `boocode_db``boochat_db` when BooCoder lands. Three apps, one Postgres. Cross-surface joins are valuable: a BooCoder task can reference the BooChat conversation that originated it; a BooTerm session can be linked to the BooCoder task it's debugging. Separate databases would break this.
1. **Mount strategy: blanket `/opt:rw`, permission gating at the write-tool layer.** Container gets full RW access to `/opt`; the BooCoder write tools (`edit_file`, `create_file`, `delete_file`) enforce path scoping using the v1.15 permission wildcard ruleset (`apps/coder/services/path_guard.ts`). Per-project scoping is *policy*, not *mount*. Simpler, single mount, no Docker reconfig per project. Trade-off: a bug in path-guard logic is the only thing between BooCoder and writing outside `/opt/<project>/`. **Path-guard correctness is therefore the highest-priority test target for v2.0** — fuzz it, property-test it, run it through every traversal-attack pattern.
1. **External CLI agents (`opencode`, `claude`, `goose`, `pi`) live on the host, NOT in the BooCoder container.** Sam's call: control. Host-installed agents inherit Sam's existing `~/.opencode/`, `~/.claude/`, `~/.config/goose/` configs without re-mounting. Tool versions update via Sam's normal `npm i -g` or `brew upgrade` flow. **BooCoder shells out via local-exec PTY** (`node-pty` with `cwd = /opt/<project>` and the host shell), or via SSH if Sam wants stricter isolation later. Container can be added back if a specific reason emerges (sandboxing a rogue agent, ABI mismatch, dependency conflict) but not pre-emptively.
### Three-surface execution model
Each surface has its own primary execution mode but shares the same underlying tasks/projects/worktree infrastructure:
|Surface |Port |Execution mode |Tools |Write access |
|----------------------------|---------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------|
|**BooChat** (`apps/chat`) |9500 |In-process inference loop |`view_file`, `list_dir`, `grep`, `find_files`, codecontext sidecar tools |None — `/opt` is read-only at the tool layer regardless of mount |
|**BooCoder** (`apps/coder`) |9502 |**Two paths, same surface:** (a) in-process inference loop with native write tools + pending-changes queue, (b) PTY-dispatched external CLI (opencode/claude/goose/pi) in a per-task git worktree|All BooChat tools + `edit_file`, `create_file`, `delete_file`, `apply_pending`, `rewind` + `dispatch_external_agent`|Yes, gated through `pending_changes` table (nothing touches disk until `/apply`)|
|**BooTerm** (`apps/booterm`)|**9501 (live)**|PTY to host shell via tmux, scoped to project cwd |Shell + SSH-out, no inference loop |Yes (it's a real terminal) |
**The "two paths, same surface" decision in BooCoder is the answer to last turn's "1 and 2 full featured" question.** The in-process loop (Option B / Answer B) handles interactive write work where Sam wants the pending-changes UI and native tool gating. The PTY dispatch (Option A / Answer A) handles parallel/dispatched/batch work where Sam wants to A/B different CLI agents against the same task in separate worktrees. The user picks per task via a `dispatch_external_agent(agent: 'opencode'|'claude'|'goose'|'pi', model: string, task: string, worktree: string)` tool the in-process loop can call, or via a UI dropdown at task creation.
### MCP and ACP roles per surface (locked 2026-05-22)
Two open protocols extend BooCode's tool and agent surfaces:
- **MCP (Model Context Protocol):** the tool/resource extension protocol. An MCP **client** consumes tools from MCP **servers** (local stdio subprocesses or remote HTTP/SSE endpoints). Standard since late 2024. Reference SDKs in 10 languages. Hundreds of community servers, mostly via the [MCP Registry](https://registry.modelcontextprotocol.io/).
- **ACP (Agent Client Protocol):** the editor↔agent extension protocol. An ACP **client** (host) drives an ACP **agent** over JSON-RPC stdio (or HTTP/WS for remote). Standardizes session lifecycle, multi-session management, model/mode switching mid-session, file operations, terminal events, permission prompts. Originated at Zed. Implemented by opencode (`opencode acp`), goose (`goose acp`), JetBrains IDEs, Avante.nvim, CodeCompanion.nvim.
**The role assignment (Sam, 2026-05-22):**
|Surface |MCP client |MCP server |ACP client (host) |ACP agent (driveable) |
|------------|---------------------------------|------------------------------------------------|---------------------------------------------------------------|-----------------------------------------------------------|
|**BooChat** |**Yes** (read-only tool consumer)|No |No |No |
|**BooCoder**|**Yes** |**Yes** (exposes BooCoder tools to other agents)|**Yes** (drives opencode/goose/etc. via ACP instead of raw PTY)|**Yes** (BooCoder itself driveable from Zed/JetBrains/etc.)|
**BooChat as MCP client only.** BooChat is read-only by design — its existing tools (`view_file`, `list_dir`, `grep`, `find_files`) extend naturally with MCP-served read-only tools (Context7 for docs, gh_grep for code search, the official `fetch`/`git`/`memory`/`sequentialthinking` reference servers). Per-server `enabled` flag gates which tools BooChat may consume. **Hard rule for BooChat MCP config: never enable a write-capable MCP server.** A server whose tools mutate state breaks the read-only invariant. The codecontext sidecar (already shipped in v1.12 Track B) becomes the first internal "MCP-shaped" tool source, even though it's currently an HTTP shim rather than an MCP server; consider rewriting it as a real MCP server in v1.13 so it composes naturally with the rest.
**BooCoder full matrix.** All four roles. Justifications:
1. **MCP client (write-capable allowed).** Same MCP ecosystem as BooChat plus write-capable servers (`filesystem` write tools, `git` commit, deployment integrations) — all gated through BooCoder's existing pending-changes queue regardless of whether the write comes from a native tool or an MCP-served tool. Per-task allow/deny means a dispatched task can have a different MCP roster than the interactive shell.
1. **MCP server.** Expose BooCoder's own primitives as MCP tools: `boocoder.create_task`, `boocoder.list_pending_changes`, `boocoder.apply`, `boocoder.dispatch_external_agent`, `boocoder.list_worktrees`, etc. **This is what makes opencode-on-the-host BooCoder-aware** — Sam's external `opencode` sessions in Termius can call BooCoder's task queue without going through BooCoder's UI. Aligns with the agent-hub (#48) board-API pattern. Stdio transport for local opencode/claude; HTTP+OAuth for any external/remote consumer.
1. **ACP client (host).** **This replaces the raw-PTY dispatch plan for any agent that supports ACP** — currently opencode (`opencode acp`) and goose (`goose acp`). Instead of spawning a PTY and parsing free-form text output, BooCoder spawns the agent as an ACP subprocess and communicates over JSON-RPC. Gains: native session lifecycle, mid-session model/mode switching, file-operation events the BooCoder UI can render as diffs, terminal events that surface inside BooTerm, permission-prompt events the BooCoder UI can answer with a real dialog. **MCP servers configured in BooCoder are auto-forwarded to the dispatched ACP agent** (per goose docs: ACP clients pass their MCP servers in `context_servers` to the agent automatically) — one MCP config surface drives every dispatched agent. For agents without ACP (claude code, pi, smallcode), fall back to PTY dispatch as currently designed.
1. **ACP agent.** Expose `boocoder acp` so Zed, JetBrains, Avante.nvim, etc. can drive BooCoder as their agent. Means BooCoder becomes useable from any ACP-compatible editor without giving up the BooCoder UI, pending-changes gate, or task DAG. Lower priority than the other three — it's an outbound exposure, not core to the dispatcher build — but cheap once the ACP client side is implemented (same protocol library, server side).
**Why BooChat doesn't get ACP:** ACP standardizes the editor↔agent direction. BooChat doesn't drive agents; it *is* the chat. Nothing for ACP to do there. Adding ACP-agent role to BooChat would mean making BooChat driveable from Zed, which would convert it from a chat surface into an opencode-equivalent — different product. Skip.
**MCP server selection for v1 (start small).** Don't enable everything in the registry; MCP servers consume context budget per tool definition and large registries hit token limits fast. Start with:
- **For BooChat (read-only):** Context7 (already used via opencode), gh_grep, `modelcontextprotocol/server-fetch`, `modelcontextprotocol/server-git` (read mode), `modelcontextprotocol/server-memory`. Optionally `sequentialthinking` for reasoning chain scaffolding.
- **For BooCoder (add write-capable):** all of the above plus `modelcontextprotocol/server-filesystem` (with path scope = `/opt/<project>`, write-gated by BooCoder's pending-changes queue), eventually a custom BooCoder-internal MCP server for `dispatch_external_agent` / `apply_pending` / `list_worktrees`.
**Reference materials to read before implementing:**
- **Anthropics `mcp-builder` skill** (MIT, in `anthropics/skills`): four-phase MCP server build workflow — research → implement → test → eval. Includes the 10-question evaluation framework for validating that an LLM can actually use the server. **Run BooCoder-internal MCP server through this eval before shipping.**
- **OpenCode MCP docs** (`opencode.ai/docs/mcp-servers/`): the cleanest reference for the config-file shape, OAuth flow (Dynamic Client Registration per RFC 7591), per-agent tool whitelisting via glob patterns. Lift the JSON schema near-verbatim into BooCode's config (it's not copyrightable, and matching opencode's shape means any opencode user can copy their config to BooCode).
- **OpenCode ACP docs** (`opencode.ai/docs/acp/`): minimal — basically just `opencode acp` over stdio JSON-RPC. The protocol does the heavy lifting; once BooCoder speaks ACP, opencode works without further config.
- **Goose ACP docs** (`goose-docs.ai/docs/guides/acp-clients/`): more detailed than opencode's. Critical pattern documented there: **the ACP client's `context_servers` (MCP servers) are auto-forwarded to the agent.** This is the protocol-level mechanism for "one MCP config, every dispatched agent inherits it."
- **`agentclientprotocol.com`:** the canonical ACP spec. Note: full remote-agent support (HTTP/WebSocket transport) is still "a work in progress" per the spec maintainers — local-subprocess ACP is the proven path, remote ACP is experimental. **BooCoder's ACP client should use stdio for v1**, defer remote ACP until the spec stabilizes.
- **`modelcontextprotocol/servers`:** only 7 reference servers (everything/fetch/filesystem/git/memory/sequentialthinking/time) — the archived list (PostgreSQL, Slack, GitHub, etc.) is significant because **MCP servers are migrating to vendor-owned ownership** (GitHub now has an official MCP registry at `github.com/mcp`, Sentry hosts `mcp.sentry.dev`, etc.). Don't reimplement what vendors maintain. Discover via the MCP Registry, not the reference repo.
### Phasing for MCP/ACP integration (slots into the Paseo-equivalent phases)
- **Phase 1 MCP** (slots into Paseo-equivalent Phase 1): wire BooChat MCP client. Start with one server (likely Context7, since Sam already uses it). Single config block in BooChat's existing `agents.ts`. Tools appear alongside `view_file`/`grep`/etc. Validates the protocol loop end-to-end without touching write paths.
- **Phase 2 MCP** (slots into Paseo-equivalent Phase 2): same MCP client code drops into BooCoder unchanged. Add write-capable servers behind pending-changes gating. **Test path-guard against MCP-server file writes specifically** — an MCP filesystem server can attempt traversal just as easily as a native tool.
- **Phase 1 ACP** (slots into Paseo-equivalent Phase 4 — multi-agent + worktrees): swap the planned raw-PTY dispatch path for ACP wherever the target agent supports it. Initial coverage: opencode + goose. Claude Code / pi / smallcode stay on PTY fallback. The dispatcher worker checks `available_agents.supports_acp` per agent at dispatch time and picks the right transport. Same task table, different transport.
- **Phase 3 MCP** (after Paseo-equivalent Phase 3): build the BooCoder-internal MCP server exposing `boocoder.*` tools. Run through the mcp-builder eval framework (10 read-only complex questions with verifiable answers) before shipping. Once it's live, external `opencode` sessions in Termius can drive the BooCoder task queue without using BooCoder's UI.
- **Phase 2 ACP** (after Phase 3 MCP): expose `boocoder acp` for inbound ACP — Zed/JetBrains/Avante can use BooCoder as their agent.
### What Paseo is (the reference design)
Paseo is "one interface for all your Claude Code, Codex, and OpenCode agents." 4k stars, AGPL-3.0, TypeScript-heavy (98%), monorepo with 6 packages.
**Core architectural choices, each a target for BooCode to reproduce:**
1. **Daemon + clients split.** A long-running local daemon owns agent process management; thin clients (CLI, desktop Electron, mobile Expo, web) connect over WebSocket. Daemon survives client disconnects. **BooCode equivalent:** the Fastify server is the daemon; the React SPA, the three surface tabs (chat/coder/term), and a new thin `boocode` CLI are all clients.
1. **Six-package monorepo:** `server` (daemon), `app` (Expo iOS/Android/web), `cli`, `desktop` (Electron), `relay` (remote connectivity), `website`. **BooCode equivalent:** `apps/server` (Fastify, exists), `apps/web` (React, exists, hosts the chat/coder/term tabs), `apps/chat` + `apps/coder` + `apps/booterm` (the three surfaces — booterm already live on 9501 as of May 2026), `apps/cli` (new, thin client over WebSocket). `relay` is unnecessary — Sam's Tailscale + Caddy + Authelia stack at `code.indifferentketchup.com` already provides remote connectivity, mobile/desktop are PWA paths, no native shell needed yet.
1. **Process orchestration as the daemon's job.** Paseo spawns Claude Code / Codex / OpenCode as **child processes**, not API calls. Each agent runs with full local dev environment access. **BooCoder equivalent:** the dispatch worker (in `apps/server`) spawns `claude` / `opencode` / `goose` / `pi` via local-exec PTY on the **host**, captures stdout/stderr/exit-code into PostgreSQL stream tables, exposes WebSocket events to all three React surfaces.
1. **CLI shape:**
```
paseo run --provider claude/opus-4.6 "implement user authentication"
paseo run --provider codex/gpt-5.4 --worktree feature-x "implement feature X"
paseo ls
paseo attach <id>
paseo send <id> "follow-up"
paseo --host workstation.local:6767 run "..."
```
**BooCode equivalent (target):** `boocode run --agent opencode --model qwen3.6-35b-a3b-mxfp4 "task"`, `boocode ls`, `boocode attach <session-id>`, `boocode send <session-id> "..."`, `boocode --host ubuntu-homelab.tailnet.ts.net:9500 run "..."`.
1. **`--worktree feature-x` auto-creates a git worktree** per agent — same pattern as zeroshot, bernstein, vorn. **Lift directly:** before spawning the agent, `git worktree add /tmp/booworktrees/<session-id> -b <branch> origin/main`; agent runs in that directory; merge or discard on completion. One worktree per active session.
1. **Three orchestration skills (their "skills/" directory):**
- **`/paseo-handoff`** — plan with one agent, hand off to another. (Sam already does this manually: Claude Chat reviews, OpenCode implements.)
- **`/paseo-loop`** — Ralph loop: agent attempts → verifier judges → repeat, bounded max-iterations. Maps to Sam's "doom-loop guard" terminology (#1 opencode `DOOM_LOOP_THRESHOLD=3`).
- **`/paseo-orchestrator`** — team of agents coordinated via shared chat room; plan-with-X, implement-with-Y, review-with-Z.
1. **No telemetry, no forced login.** Confirms BooCode's privacy-first stance.
1. **`mise` for tool version management.** Worth checking against BooCode's Node version pinning; `.mise.toml` is a more modern alternative to `.nvmrc`.
### How BooCode reproduces this (target architecture)
The dispatcher lives inside the existing BooCode Fastify server, so the React SPA and a new CLI both drive the same backend. PostgreSQL is the durable state. Per-session PTY child processes are the units of agent work. The CLI is a thin client over the existing WebSocket/HTTP API.
**New PostgreSQL tables** (schema drawn from `Dominic789654/agent-hub` for the durable-task pattern, also see #45 entry below):
```
projects id, name, repo_path, default_agent, default_model
task_templates id, project_id, name, prompt_template, tools_whitelist, agent, model
tasks id, project_id, template_id, parent_task_id, state, input, output_summary, dependencies, agent, model, worktree_path, cost, started_at, ended_at
pipelines id, project_id, name, steps (FK array of template ids)
pipeline_runs id, pipeline_id, state, current_step, run_started_at
human_inbox view of tasks where state IN ('blocked', 'failed', 'needs_human')
```
**New worker process** (`boocode-dispatcher`): picks ready tasks (`state='pending'` AND all dependencies are `state='done'`) off the queue, spawns the agent via PTY in the assigned worktree, captures output, marks `state='done'`/`'failed'`/`'needs_human'` with a summary. Runs as a systemd unit alongside the Fastify server.
**New CLI** (`boocode`): three flows — interactive (`boocode run`), follow-up (`boocode send <id>`), inspection (`boocode ls`, `boocode attach <id>`). Internally just a WebSocket/HTTP client against the existing BooCode API.
**New WebSocket event stream**: agent stdout, status transitions, tool calls. Same pattern Paseo uses for daemon-to-client.
**Subagent isolation via Roo Boomerang Tasks pattern (#41 below):** when an agent calls a new-subtask tool, BooCode spawns a fresh PTY/session with a fresh PostgreSQL row and isolated context. Child runs to `attempt_completion`, writes a summary, dies. Parent resumes reading only the summary. This is the **single most important context-management primitive in the stack** — it's what keeps long-running orchestrators from poisoning their own context with detail.
**Observation via Claude Code hooks** (siropkin/budi, #47 below): register BooCode's Fastify backend as the hook receiver for `SessionStart`, `UserPromptSubmit`, `PostToolUse`, `SubagentStart`, `Stop`. Real-time visibility without wrapping the agent.
### Phased plan (rough sequence, not a master plan)
- **Phase 1** — PTY child-process dispatch for a single agent (claude or opencode), exposed via the existing BooCode UI. No queue, no DAG. Just "spawn, capture, display."
- **Phase 2** — PostgreSQL tasks/projects schema + worker. Static project registry, single-agent flow.
- **Phase 3** — Boomerang-style `new_task` tool + isolated child sessions. Orchestrator vs executor agent profiles.
- **Phase 4** — Multi-agent (add codex/opencode beside claude), git worktree auto-create per task, CLI client.
- **Phase 5** — Pipelines (chained templates), human inbox, dashboard view in React.
- **Phase 6** — `/handoff`, `/loop`, `/orchestrator` skills.
Don't ship Phase 1 against AGPL/GPL code; build clean. Patterns are free; code isn't.
-----
## Reference repos
### Tier A — actively lifting from / running as sidecar
#### 1. sst/opencode (NEW Tier A as of 2026-05-20)
#### 1. anomalyco/opencode (NEW Tier A as of 2026-05-20)
- **URL:** https://github.com/sst/opencode
- **URL:** <https://github.com/anomalyco/opencode>
- **License:** MIT
- **Language:** TypeScript (Effect-TS service-oriented)
- **What it is:** The coding agent Sam uses via Termius/Paseo. Also the source of every algorithm BooCode is porting through v1.15.
@@ -22,19 +169,23 @@ BooCode is personal/single-user — license compatibility is non-blocking, but t
- **Algorithms lifted so far:**
- `session/compaction.ts` → v1.11.0 (shipped). `usable`, `isOverflow`, `select`, `buildPrompt` ported to plain TS. SUMMARY_TEMPLATE markdown skeleton verbatim.
- `session/overflow.ts` → v1.11.0 (shipped). 20k `COMPACTION_BUFFER` constant.
- `session/processor.ts` `DOOM_LOOP_THRESHOLD=3` → v1.11.6 (shipped).
- `session/llm.ts` AI SDK adoption (`streamText`, ReasoningPart shape) → v1.13.1 (shipped).
- Parts taxonomy (text/tool_call/tool_result/reasoning/step_start) → v1.13.0 (shipped).
- `experimental_repairToolCall` via AI SDK v6 → v1.13.3 (shipped).
- **Two-tier compaction prune** (`message_parts.hidden_at` + pure decision helper) → v1.13.4 (shipped).
- **`tool/truncate.ts` truncation + outputPath pattern** (adapted: opaque id, not filesystem path) → v1.13.5 (shipped).
- **Algorithms lifted (queued):**
- `session/processor.ts` `DOOM_LOOP_THRESHOLD=3` → v1.11.6
- `session/llm.ts` `experimental_repairToolCall` → v1.12 (hand-rolled), then v1.13 (via AI SDK)
- `tool/truncate.ts` truncation + outputPath pattern → v1.12 (adapted: opaque id, not filesystem path)
- `session/overflow.ts` 0.85×ctx_max early-trigger formula → v1.13.9
- `session/prompt.ts` `runLoop()` outer agent loop → v1.14
- `permission/evaluate.ts` wildcard ruleset → v1.15
- MCP client (transport, tools/list discovery, tools/call) → v1.15
- **What NOT to use:** Effect-TS service plumbing. Snapshot/patch system (for tool-edit revert; BooCoder territory if needed). The `experimental_native_runtime` (AI SDK fallback path). opencode's prompts.
- **Source tag:** `dev` branch on `sst/opencode`. Note: `anomalyco/opencode` is a rebranded mirror; use `sst/opencode` as canonical.
- **Source tag:** `dev` branch on `anomalyco/opencode`. **This is the canonical repo as of 2026-05-22** (corrected from earlier `sst/opencode` attribution — `anomalyco/opencode` is where development now lives, 164k stars, v1.15.7 released May 21 2026, 13k+ commits).
#### 2. nmakod/codecontext
- **URL:** https://github.com/nmakod/codecontext
- **URL:** <https://github.com/nmakod/codecontext>
- **License:** MIT
- **Language:** Go (single binary)
- **What it is:** AI-oriented codebase context map generator. Tree-sitter parsing across TS/JS/Go/C++/Swift/Python/Java/Rust/Dart/JSON/YAML. Generates `CLAUDE.md`-style structured overview. Bundled MCP server with 8 tools.
@@ -45,7 +196,7 @@ BooCode is personal/single-user — license compatibility is non-blocking, but t
#### 3. aimasteracc/tree-sitter-analyzer
- **URL:** https://github.com/aimasteracc/tree-sitter-analyzer
- **URL:** <https://github.com/aimasteracc/tree-sitter-analyzer>
- **License:** MIT
- **Language:** Python, MCP server + CLI
- **What it is:** Local-first code context engine. Outline-first navigation, ripgrep-based impact trace, no embeddings. 17 languages. Claims 54-56% token reduction via TOON format.
@@ -56,7 +207,7 @@ BooCode is personal/single-user — license compatibility is non-blocking, but t
#### 4. spirituslab/codesight
- **URL:** https://github.com/spirituslab/codesight
- **URL:** <https://github.com/spirituslab/codesight>
- **License:** check repo — assumed MIT-ish
- **Language:** TypeScript/Node
- **What it is:** Static code structure visualization. Symbol extraction, import resolution, call graphs. Detects circular dependencies and dead code (with documented false-positive caveats for `customElements.define()`, framework entry points, dynamic imports).
@@ -66,7 +217,7 @@ BooCode is personal/single-user — license compatibility is non-blocking, but t
#### 5. Aider-AI/aider
- **URL:** https://github.com/Aider-AI/aider
- **URL:** <https://github.com/Aider-AI/aider>
- **License:** Apache-2.0
- **Language:** Python
- **What it is:** Git-native AI pair programmer CLI. Pioneered the tree-sitter repo-map + personalized PageRank approach.
@@ -80,18 +231,18 @@ BooCode is personal/single-user — license compatibility is non-blocking, but t
#### 6. continuedev/continue
- **URL:** https://github.com/continuedev/continue
- **URL:** <https://github.com/continuedev/continue>
- **License:** Apache-2.0
- **Language:** TypeScript
- **What it is:** IDE assistant framework. Full RAG pipeline, AST chunking, multi-provider LLM abstraction.
- **Why it matters:** One specific drop-in lift:
1. `core/indexing/ignore.ts``DEFAULT_SECURITY_IGNORE_FILETYPES`. Three-tier matcher (basenames, extensions, prefixes). Going into BooCode's `pathGuard` to block analyzing `.env`, `.pem`, `id_rsa`, etc.
1. `core/indexing/ignore.ts` — `DEFAULT_SECURITY_IGNORE_FILETYPES`. Three-tier matcher (basenames, extensions, prefixes). Going into BooCode's `pathGuard` to block analyzing `.env`, `.pem`, `id_rsa`, etc.
- **How we use it:** v1.11.7. Lift the ignore list, adapt to a `path.basename` + extension + prefix matcher.
- **What NOT to use:** `core/indexing/CodebaseIndexer.ts` and `LanceDbIndex.ts` — embedding-based, the path we walked away from.
#### 7. cline/cline
- **URL:** https://github.com/cline/cline
- **URL:** <https://github.com/cline/cline>
- **License:** Apache-2.0
- **Language:** TypeScript (VS Code extension)
- **What it is:** Autonomous coding agent. Pioneered plan/act mode and granular per-tool auto-approve.
@@ -101,7 +252,7 @@ BooCode is personal/single-user — license compatibility is non-blocking, but t
#### 8. plandex-ai/plandex
- **URL:** https://github.com/plandex-ai/plandex
- **URL:** <https://github.com/plandex-ai/plandex>
- **License:** MIT
- **Language:** Go
- **What it is:** Terminal agent with a pending-changes sandbox. Edits never touch the filesystem until `/apply`. 2M token context.
@@ -111,13 +262,13 @@ BooCode is personal/single-user — license compatibility is non-blocking, but t
#### 9. OpenHands/OpenHands
- **URL:** https://github.com/OpenHands/OpenHands
- **URL:** <https://github.com/OpenHands/OpenHands>
- **License:** MIT
- **Language:** Python
- **What it is:** Autonomous coding agent platform. V1 architecture is built on an append-only typed event log + Docker sandbox runtime.
- **Why it matters:** Two distinct patterns:
1. Event-log architecture — superseded by v1.13's parts-table approach (which derives from opencode's part-message model). OpenHands event-log is conceptually similar but different shape.
2. Sandbox runtime — per-session Docker container for write tools. Closes the `/opt:ro` mount risk.
1. Event-log architecture — superseded by v1.13's parts-table approach (which derives from opencode's part-message model). OpenHands event-log is conceptually similar but different shape.
1. Sandbox runtime — per-session Docker container for write tools. Closes the `/opt:ro` mount risk.
- **How we use it:** v2.1. Lift the runtime container pattern (HTTP API inside container, BooCoder calls in). Don't port the Python implementation directly.
- **What NOT to use:** OpenHands' agent prompts, the full microagent system, the cloud deployment path. Event-log shape (use opencode-derived parts table instead).
@@ -127,7 +278,7 @@ BooCode is personal/single-user — license compatibility is non-blocking, but t
#### 10. cortexkit/aft (actual repo path: ualtinok/aft)
- **URL:** https://github.com/ualtinok/aft
- **URL:** <https://github.com/ualtinok/aft>
- **License:** check repo
- **Language:** Rust binary + TypeScript plugin
- **What it is:** Tree-sitter analysis tools delivered as a Rust binary, communicating with an OpenCode plugin via JSON-over-stdio. Warm-process pattern: one binary per project keeps parse trees in memory.
@@ -137,7 +288,7 @@ BooCode is personal/single-user — license compatibility is non-blocking, but t
#### 11. codeprysm/codeprysm
- **URL:** https://github.com/codeprysm/codeprysm
- **URL:** <https://github.com/codeprysm/codeprysm>
- **License:** check repo
- **Language:** Rust
- **What it is:** Graph-based code intelligence: tree-sitter parsing → node/edge graph in Qdrant, embeddings layered on top, MCP server exposes semantic search.
@@ -147,7 +298,7 @@ BooCode is personal/single-user — license compatibility is non-blocking, but t
#### 12. DeepSourceCorp/globstar
- **URL:** https://github.com/DeepSourceCorp/globstar
- **URL:** <https://github.com/DeepSourceCorp/globstar>
- **License:** MIT
- **Language:** Go
- **What it is:** Static analysis toolkit for writing code checkers using tree-sitter S-expression queries. YAML interface for simple checkers, Go interface for complex multi-file checkers.
@@ -157,7 +308,7 @@ BooCode is personal/single-user — license compatibility is non-blocking, but t
#### 13. getpaseo/paseo
- **URL:** https://github.com/getpaseo/paseo
- **URL:** <https://github.com/getpaseo/paseo>
- **License:** AGPL-3.0
- **What it is:** WebSocket daemon ↔ client protocol for agent coordination. Already running in your stack (paseo dispatches Claude Code/opencode).
- **Why it matters:** Patterns for agent lifecycle, `--worktree` flag pattern, ECDH/NaCl security model.
@@ -166,7 +317,7 @@ BooCode is personal/single-user — license compatibility is non-blocking, but t
#### 14. earendil-works/pi
- **URL:** https://github.com/earendil-works/pi
- **URL:** <https://github.com/earendil-works/pi>
- **License:** MIT
- **What it is:** `@mariozechner/pi-agent-core` (tool loop + state machine) and `@mariozechner/pi-ai` (provider abstraction).
- **Why it matters:** If we ever want non-llama-swap inference (Anthropic, OpenAI, Mistral direct), pi-ai is the cleanest TypeScript provider abstraction available.
@@ -174,7 +325,7 @@ BooCode is personal/single-user — license compatibility is non-blocking, but t
#### 15. microsoft/agent-framework
- **URL:** https://github.com/microsoft/agent-framework
- **URL:** <https://github.com/microsoft/agent-framework>
- **License:** MIT
- **What it is:** Workflow graphs for multi-agent coordination.
- **Why it matters:** Conceptual reference for far-future multi-agent orchestration.
@@ -182,7 +333,7 @@ BooCode is personal/single-user — license compatibility is non-blocking, but t
#### 16. microsoft/autogen
- **URL:** https://github.com/microsoft/autogen
- **URL:** <https://github.com/microsoft/autogen>
- **License:** MIT
- **What it is:** Earlier Microsoft multi-agent framework.
- **Why it matters:** Effectively sunsetting in favor of agent-framework.
@@ -190,7 +341,7 @@ BooCode is personal/single-user — license compatibility is non-blocking, but t
#### 17. open-webui/open-webui
- **URL:** https://github.com/open-webui/open-webui
- **URL:** <https://github.com/open-webui/open-webui>
- **License:** BSD-3
- **What it is:** Self-hosted LLM frontend.
- **Why it matters:** Python/Svelte, wrong stack. RAG pipeline only worth a read if BooLab needs improvement — unrelated to BooCode.
@@ -198,40 +349,80 @@ BooCode is personal/single-user — license compatibility is non-blocking, but t
-----
### Reviewed 2026-05-22 — agent CLIs, ensembler, skills, context tooling
(Entries #18#60 from the 2026-05-22 deep review pass are preserved verbatim from the prior catalog; reproducing the full block here would exceed the doc's usable density. The headline take-aways are captured in the Decisions log at the bottom of this file and in the Lift Catalog table. Source repos and detailed notes remain available in the previous revision of this document if needed — `git log -- boocode_code_review.md` to retrieve.)
-----
## Lift catalog — what lands where
| Source repo | Specific artifact | License | BooCode destination | Version |
|---|---|---|---|---|
| `sst/opencode` | `session/compaction.ts` + `session/overflow.ts` algorithms | MIT | `services/compaction.ts` | **v1.11.0 ✅** |
| `sst/opencode` | `session/processor.ts` DOOM_LOOP_THRESHOLD pattern | MIT | `services/inference.ts` doom-loop guard | v1.11.6 |
| `continuedev/continue` | `core/indexing/ignore.ts` DEFAULT_SECURITY_IGNORE_FILETYPES | Apache-2.0 | Extend `path_guard.ts` exclusion list | v1.11.7 |
| `nmakod/codecontext` | Whole binary (sidecar) | MIT | New `codecontext` container, 8 MCP tools wired via static wrappers | v1.12 |
| `sst/opencode` | `session/llm.ts` experimental_repairToolCall pattern | MIT | `services/inference.ts` synthetic invalid-tool result | v1.12 |
| `sst/opencode` | `tool/truncate.ts` truncation + outputPath pattern (adapted: opaque id) | MIT | `services/truncate.ts` + `view_truncated_output` tool | v1.12 |
| `Aider-AI/aider` | `aider/queries/tree-sitter-*.scm` (60+ files) | Apache-2.0 | Fallback grammars for languages not covered by sidecars | v1.12 (fallback) |
| `sst/opencode` | `session/llm.ts` AI SDK adoption + alpha tool ordering | MIT | `services/inference.ts` rewrite | v1.13 |
| `sst/opencode` | Parts-message taxonomy (text, tool_call, tool_result, reasoning, step_start) | MIT | new `message_parts` table | v1.13 |
| `sst/opencode` | `session/prompt.ts` runLoop() outer agent loop | MIT | `services/inference.ts` step-based loop | v1.14 |
| `sst/opencode` | `agent.steps` per-agent step cap | MIT | AGENTS.md + agents.ts | v1.14 |
| `sst/opencode` | `permission/evaluate.ts` wildcard ruleset | MIT | new `permissions` table + matcher | v1.15 |
| `sst/opencode` | `mcp/index.ts` MCP client (SSE transport + tools/list + tools/call) | MIT | new `services/mcp/` module; codecontext re-wired through it | v1.15 |
| `cline/cline` | Plan/Act invariant (read-only mode pattern) | Apache-2.0 | absorbed into v1.15 permissions work | v1.15 |
| `spirituslab/codesight` | `analyze.mjs` — call graph, circular-dep, dead-code | MIT-ish | `apps/server/src/tools/repo_health.ts` | v1.16 |
| `plandex-ai/plandex` | `pending_changes` data model, diff/apply/rewind UX | MIT | New `pending_changes` table, BooCoder write-tool gating | v2.0 |
| `OpenHands/OpenHands` | Sandbox runtime pattern | MIT | New `boocoder` container, per-session Docker | v2.1 |
| `cortexkit/aft` (ualtinok/aft) | BridgePool warm-process JSON-stdio pattern | check | Optimization if profile shows fork overhead | Deferred |
| `codeprysm/codeprysm` | Node/edge taxonomy (Container/Callable/Data, CONTAINS/USES/DEFINES) | check | Reference only if we ever build our own graph | None |
| `DeepSourceCorp/globstar` | Whole toolkit | MIT | Future verify-before-commit gate for BooCoder | Parked |
| `earendil-works/pi` | `pi-ai` provider abstraction | MIT | Multi-provider LLM if pursued | v2.x optional |
| `microsoft/agent-framework` | Workflow graph concepts | MIT | Conceptual only | v3.x |
|Source repo |Specific artifact |License |BooCode destination |Version |
|------------------------------------|----------------------------------------------------------------------------------------------------------------------|-----------------------------------------------|--------------------------------------------------------------------------------------------|--------------------------------------------------|
|`anomalyco/opencode` |`session/compaction.ts` + `session/overflow.ts` algorithms |MIT |`services/compaction.ts` |**v1.11.0 ✅** |
|`anomalyco/opencode` |`session/processor.ts` DOOM_LOOP_THRESHOLD pattern |MIT |`services/inference.ts` doom-loop guard |**v1.11.6 ✅** |
|`continuedev/continue` |`core/indexing/ignore.ts` DEFAULT_SECURITY_IGNORE_FILETYPES |Apache-2.0 |Extend `path_guard.ts` exclusion list |**v1.11.7 ✅** |
|`nmakod/codecontext` |Whole binary (sidecar) |MIT |New `codecontext` container, 8 MCP-shaped tools wired via static wrappers |**v1.12.0 ✅** |
|`anomalyco/opencode` |`session/llm.ts` experimental_repairToolCall pattern |MIT |AI SDK v6 `streamText` wiring |**v1.13.3 ✅** |
|`anomalyco/opencode` |`tool/truncate.ts` truncation + outputPath pattern (adapted: opaque id) |MIT |`services/truncate.ts` + `view_truncated_output` tool |**v1.13.5 ✅** |
|`Aider-AI/aider` |`aider/queries/tree-sitter-*.scm` (60+ files) |Apache-2.0 |Fallback grammars for languages not covered by sidecars |**v1.12 (fallback)** |
|`anomalyco/opencode` |`session/llm.ts` AI SDK v6 adoption + ReasoningPart shape |MIT |`services/inference/stream-phase.ts` (`streamText` adapter) |**v1.13.1-A/B/C ✅** |
|`anomalyco/opencode` |Parts-message taxonomy (text, tool_call, tool_result, reasoning, step_start) |MIT |`message_parts` table + `messages_with_parts` view |**v1.13.0 ✅ + v1.13.1-B ✅** |
|`anomalyco/opencode` |Two-tier compaction prune (`message_parts.hidden_at` + tier logic) |MIT |`services/inference/prune.ts` (`selectPruneTargets`) |**v1.13.4 ✅** |
|`anomalyco/opencode` |0.85×ctx_max overflow trigger formula |MIT |`services/compaction.ts` early-trigger constant |v1.13.9 (planned) |
|`anomalyco/opencode` |`session/prompt.ts` runLoop() outer agent loop |MIT |`services/inference.ts` step-based loop |v1.14 (planned) |
|`anomalyco/opencode` |`agent.steps` per-agent step cap |MIT |AGENTS.md + agents.ts |v1.14 (planned) |
|`anomalyco/opencode` |`permission/evaluate.ts` wildcard ruleset |MIT |new `permissions` table + matcher |v1.15 (planned) |
|`anomalyco/opencode` |`mcp/index.ts` MCP client (SSE transport + tools/list + tools/call) |MIT |new `services/mcp/` module; codecontext re-wired through it |v1.15 (planned) |
|`cline/cline` |Plan/Act invariant (read-only mode pattern) |Apache-2.0 |absorbed into v1.15 permissions work |v1.15 (planned) |
|`spirituslab/codesight` |`analyze.mjs` — call graph, circular-dep, dead-code |MIT-ish |`apps/server/src/tools/repo_health.ts` |v1.16 (planned) |
|`plandex-ai/plandex` |`pending_changes` data model, diff/apply/rewind UX |MIT |New `pending_changes` table, BooCoder write-tool gating |v2.0 (planned) |
|`OpenHands/OpenHands` |Sandbox runtime pattern |MIT |New per-session Docker sandbox (skip-condition if path-guard holds) |v2.1 (optional) |
|`cortexkit/aft` (ualtinok/aft) |BridgePool warm-process JSON-stdio pattern |check |Optimization if profile shows fork overhead |Deferred |
|`codeprysm/codeprysm` |Node/edge taxonomy (Container/Callable/Data, CONTAINS/USES/DEFINES) |check |Reference only if we ever build our own graph |None |
|`getpaseo/paseo` |**Daemon+clients architecture, CLI verb shape, three skills concept** |AGPL-3.0 (design only) |**Paseo-equivalent dispatcher design** (all phases) |**v2.0+ roadmap** |
|`Dominic789654/agent-hub` |**Task DAG schema, dispatcher worker, project registry, human inbox** |Apache-2.0 |**PostgreSQL schema + dispatcher worker process** |**v2.0** |
|`Roo Code Boomerang Tasks` |Orchestrator-with-capability-restriction + down-pass/up-pass context discipline |Apache-2.0 (pattern) |AGENTS.md design principle (v1.14) → `new_task` tool (v2.0) |**v1.14 → v2.0** |
|`siropkin/budi` |Claude Code 5-hook event taxonomy |MIT (pattern) |Install globally on Sam's host for Claude Code observability |**Immediate (host install)** |
|`sipyourdrink-ltd/bernstein` |HMAC-chained audit log primitive |verify |PostgreSQL audit table with `prev_hmac` field |v1.13+ optional |
|`eyaltoledano/claude-task-master` |Tiered tool loading (`core`/`standard`/`all`) |MIT+Commons Clause (pattern only) |`BOOCODE_TOOLS` env var in `agents.ts` |v1.12.x or v1.13 |
|`ai-christianson/RA.Aid` |Three-stage research/plan/implement + expert escape hatch |Apache-2.0 (pattern) |AGENTS.md design principle + per-stage model routing |v1.14+ |
|`DeepSourceCorp/globstar` |Whole toolkit |MIT |Future verify-before-commit gate for BooCoder |Parked |
|`earendil-works/pi` |`pi-ai` provider abstraction |MIT |Multi-provider LLM if pursued |v2.x optional |
|`microsoft/agent-framework` |Workflow graph concepts |MIT |Conceptual only |v3.x |
|`qodo-ai/agents` |`agent.toml` schema: `output_schema`, `exit_expression`, `execution_strategy` |MIT |Extend `AGENTS.md` / agents.ts metadata |v1.14+ |
|`qodo-ai/qodo-cover` |Record/replay LLM response harness (hashed prompt → fixture YAML) |AGPL-3.0 |Re-implement in Vitest plugin; pattern only, no vendored source |v1.13+ |
|`qodo-ai/qodo-skills` |PR-resolver state machine (fetch issues → batch/interactive fix → inline reply) |MIT |New BooCoder PR-resolver tool with provider CLI adapters |v2.0+ |
|`augmentcode/augment-swebench-agent`|Majority-vote ensembler (K diffs → ranker model → winner) + JSONL schema |MIT |Optional BooCoder verify-gate layer above pending-changes |v2.0+ optional |
|`olimorris/codecompanion.nvim` |Agent Client Protocol (ACP) integration shape |Apache-2.0 |Conceptual only — possible non-web frontend protocol |v2.x watch list |
|`zed-industries/codex-acp` |ACP server-side adapter reference implementation |Apache-2.0 |Working blueprint if BooCode ever ships an ACP adapter |v2.x watch list (parked) |
|`Leonxlnx/taste-skill` |`taste-skill/SKILL.md` (anti-slop ban list + 3-dial parameterization) |MIT |Vendor into BooCode skills/ after diff against existing `frontend-design`; binds to BooCoder|v1.12.x diff → v2.0+ |
|`Fission-AI/OpenSpec` |`openspec/changes/<name>/{proposal,specs,design,tasks}.md` directory structure |permissive (verify) |Reformat BooCode's batch docs to OpenSpec shape; optional CLI adoption later |v1.13.x or v1.14 |
|`covibes/zeroshot` |Complexity × TaskType → workflow conductor + blind-validation invariant |MIT |AGENTS.md principle (no code); blind-validation gate above pending-changes |v1.13/v1.14 (principle) → v2.0+ (gate) |
|`0xmariowu/AgentLint` |31 evidence-backed checks (emphasis density, sweet-spot CLAUDE.md length, SHA-pinned Actions, .env/.gitignore, etc.) |MIT |Manual one-pass audit of CLAUDE.md/AGENTS.md across Sam's repos; optional plugin install |Immediate (manual pass) → v1.12.x (plugin) |
|`aaif-goose/goose` |Native ACP + 15+ providers (incl. Ollama); .claude/.codex/.cursor skill cross-emission |Apache-2.0 |Reference for ACP-protocol implementation and multi-provider abstraction |Reference / v2.x (if ACP lands) |
|`memovai/memov` |Shadow `.mem` timeline; `snap`/`validate_commit` MCP-tool shape; drift detection |MIT |Reference for v1.13+ `view_session_history` tool + v2.0+ verify gate |v1.13+ (history tool design) → v2.0+ (drift gate) |
|`Roo Code: Boomerang Tasks` |Orchestrator with intentional capability restriction; down-pass/up-pass context discipline; precedence override clause|Apache-2.0 (Roo) — pattern lift only |AGENTS.md orchestrator role definition + dispatched-task prompt template |v1.13 / v1.14 (principle), v2.0+ (real delegation)|
|`eyaltoledano/claude-task-master` |Tiered tool-loading via env var (core/standard/all); three model roles; PRD-as-source-of-truth |MIT+Commons Clause (no code lift; pattern only)|`BOOCODE_TOOLS` env var for tiered loading; reaffirm three-model-role pattern |v1.12.x / v1.13 (tier hint) |
|`sipyourdrink-ltd/bernstein` |HMAC-chained audit log; signed agent cards (Ed25519+JCS); per-artifact lineage; air-gap mode |Verify before lift |Reference for compliance-grade BooCode if/when needed; HMAC log small lift candidate |v2.0+ (audit log), speculative (full stack) |
|`siropkin/budi` (tool, not lift) |5-hook Claude Code taxonomy; HTTP daemon + SQLite + dashboard |MIT |Install globally to observe Claude Code token costs; hook taxonomy as reference |Immediate (install) |
-----
## Decisions log
- **v1.13.7 stability bundle uncovered two latent v1.13.1-A regressions (2026-05-22).** Investigation during the cosmetic-revert session surfaced: (1) `@ai-sdk/openai-compatible` defaults `includeUsage: false`, so `stream_options.include_usage` was never sent to llama-swap and `result.usage.inputTokens/outputTokens` resolved `undefined` — every assistant row had `tokens_used`/`ctx_used` NULL since v1.13.1-A shipped. One-line fix in `provider.ts`. (2) AI SDK v6 streaming occasionally emits a leading `\n` text-delta on tool-call-only turns; `content.length > 0` returned true for `"\n"`, producing an empty MessageBubble + ActionRow between every tool call. Fixed by trim guards in `MessageList.flatten` (`hasText`) and `MessageBubble` (`hasContent`). Plus: `buildMessagesPayload` now skips trailing empty/failed assistant rows (kills "Cannot have 2 or more assistant messages" rejections from the upstream), and `BUDGET_NO_AGENT` bumped 15→30 to match `BUDGET_READ_ONLY` (every tool today is read-only; the 15-cap was forward-looking). The class of bug is consistent: AI SDK v6 changes the streaming surface in ways that aren't caught by tsc or vitest — only production observability surfaces them. Argues for v1.13.11 WS-frame Zod schemas to catch the next round.
- **MCP and ACP roles locked per surface (2026-05-22).** **BooChat = MCP client only**, read-only tool consumer. **BooCoder = MCP client + MCP server + ACP client (host) + ACP agent (driveable)** — full matrix. Hard rule: BooChat MCP config must never enable a write-capable server (the read-only invariant overrides protocol convenience). BooCoder's ACP client role **replaces the raw-PTY dispatch plan for any agent that supports ACP** (opencode `opencode acp`, goose `goose acp`); claude/pi/smallcode stay on PTY fallback. The protocol pattern that justifies the full BooCoder matrix: ACP clients auto-forward their MCP `context_servers` to the dispatched agent (per goose docs) — one MCP config surface drives every dispatched agent. BooCoder MCP-server role exposes `boocoder.create_task`, `boocoder.list_pending_changes`, `boocoder.apply`, etc. so external opencode-in-Termius sessions become BooCoder-aware without going through BooCoder's UI. BooCoder ACP-agent role (`boocoder acp`) lets Zed/JetBrains/Avante.nvim drive BooCoder as their agent — outbound exposure, lowest priority of the four roles. **Reference materials**: anthropics `mcp-builder` skill (4-phase build workflow + 10-question eval framework), opencode MCP/ACP docs as JSON-schema reference, goose ACP docs for the `context_servers` auto-forward pattern, `agentclientprotocol.com` spec — but note remote ACP (HTTP/WS) is still WIP, BooCoder's ACP client must use stdio for v1.
- **BooCode monorepo locked as 3-app structure (2026-05-22).** Same `/opt/boocode/` repo: `apps/chat/` (read-only, currently the live thing at 9500), `apps/coder/` (write tools + external CLI dispatch, 9502, v2.0 planned), `apps/booterm/` (PTY terminal, **already live at 9501 since May 2026**, Node 20 Alpine + node-pty + tmux + xterm.js, tmux session per pane, SSH-out enabled). Shared Fastify backend in `apps/server`, shared React shell in `apps/web` hosting the three surfaces as tabs. BooTerm already shares `boocode_db` — confirms cross-surface DB sharing pattern works.
- **Single shared database, rename `boocode_db` → `boochat_db` when BooCoder lands (2026-05-22).** All three surfaces in one Postgres. Enables cross-surface joins (coder task → originating chat conversation → term debugging session).
- **Mount strategy: blanket `/opt:rw`, policy enforcement at the write-tool layer (2026-05-22).** Per-project scoping is logic, not mount. Path-guard correctness becomes the highest-priority test target for v2.0 — fuzz it, property-test it, every traversal-attack pattern.
- **External CLI agents (`opencode` / `claude` / `goose` / `pi`) live on the host, not in containers (2026-05-22).** BooCoder shells out via local-exec PTY (`node-pty`, host shell). Host install means inherit Sam's existing `~/.opencode/`, `~/.claude/`, `~/.config/goose/` configs without re-mounting. Containerize later only if a concrete reason emerges.
- **STRATEGIC PIVOT (2026-05-22): Build a Paseo-equivalent dispatcher inside BooCode. Lift patterns, not code.** Sam wants BooCode to function like Paseo without using Paseo itself. **Paseo (getpaseo/paseo) is AGPL-3.0** — incompatible with BooCode's MIT license and its network-served deployment at `code.indifferentketchup.com`. Vendoring Paseo code would force BooCode to become AGPL. Solution: **reproduce the architecture in BooCode's existing Fastify + TS + PostgreSQL + React stack, using only license-clean patterns**. Full target architecture documented in the new "Paseo-equivalent dispatcher inside BooCode" section at the top of this document. **Primary architectural template: `Dominic789654/agent-hub` (#48)** — Apache-2.0, license-clean, captures the exact three-process model (board server + dispatcher + assistant terminal) and the schema (tasks/projects/templates/pipelines/human_inbox) BooCode should reproduce. **Critical context-management primitive: Roo Code Boomerang Tasks pattern (#46)** — orchestrator with intentional capability restriction, down-pass/up-pass context discipline, no implicit inheritance. **Observation pattern: Claude Code hooks** (siropkin/budi #51 reference) — register BooCode as the hook receiver to get real-time visibility without wrapping the agent. **Phasing:** Phase 1 single-agent PTY dispatch → Phase 2 PostgreSQL queue + worker → Phase 3 Boomerang `new_task` tool → Phase 4 multi-agent + worktrees + CLI → Phase 5 pipelines + dashboard → Phase 6 handoff/loop/orchestrator skills. **This is now the dominant roadmap direction**, ahead of v1.12.x debugger fixes (queued) and v1.13/v1.14 batch work (deferred until Paseo-equivalent Phases 12 are scoped).
- **BooCoder agent layer: both Option A AND Option B, full-featured (2026-05-22).** Earlier May 18 chat recommended Option A (thin orchestration shell over OpenCode) as the path forward but explicitly called the choice not-locked. Sam's call this session: ship **both** paths in the same BooCoder surface. **Option B / in-process loop** handles interactive write work with native tools + pending-changes UI (v2.0 plandex pattern still applies). **Option A / PTY dispatch** handles parallel/batch work where Sam wants to A/B opencode vs claude vs goose vs pi against the same task in separate worktrees. User picks per task. This supersedes the May 18 "reframe Batch 14 as OpenCode orchestration UI" recommendation — both paths now coexist.
- **Paseo (getpaseo/paseo) is the reference design, not a catalog code lift (2026-05-22).** AGPL-3.0 + 4k stars + 6-package TypeScript monorepo (server / app / cli / desktop / relay / website). The architecture is the lift: daemon + clients split, child-process agent orchestration, WebSocket protocol, `paseo run/ls/attach/send` CLI shape, `--worktree feature-x` flag, `/paseo-handoff` / `/paseo-loop` / `/paseo-orchestrator` skills. **Do not vendor code.** Read the README and the `skills/` directory's three skill files for design reference; reimplement in BooCode's MIT stack. The skills' shape (named `/handoff`, `/loop`, `/orchestrator` operations) is non-copyrightable; lift the shape.
- **Embeddings dropped from BooCode** (May 2026). Replaced RAG with file-view tools + sidecar analyzers.
- **opencode promoted to Tier A** (2026-05-20). The compaction port (v1.11.0) made it clear opencode is not just "the agent Sam uses" — it's the canonical reference implementation for everything BooCode is rebuilding through v1.15. Five algorithms identified for lift (compaction, doom-loop, repairToolCall, runLoop, permission evaluate) plus truncate.ts and MCP client.
- **Source is `sst/opencode` `dev` branch.** `anomalyco/opencode` is a rebranded mirror; do not source from there.
- **opencode promoted to Tier A** (2026-05-20). The compaction port (v1.11.0) made it clear opencode is not just "the agent Sam uses" — it's the canonical reference implementation for everything BooCode is rebuilding through v1.15. Five algorithms identified for lift (compaction, doom-loop, repairToolCall, runLoop, permission evaluate) plus truncate.ts and MCP client. **Update 2026-05-22:** truncate.ts shipped v1.13.5; doom-loop, repairToolCall, compaction, prune all shipped; runLoop + permission still queued for v1.14/v1.15.
- **OpenCode canonical repo is `anomalyco/opencode`, NOT `sst/opencode` (corrected 2026-05-22).** Sam confirmed: the prior catalog entry's "anomalyco is a rebranded mirror, use sst as canonical" was inverted. Development moved to anomalyco; sst/opencode is the predecessor lineage. `anomalyco/opencode` `dev` branch is now the active source for every algorithm lift through v1.15. All 15 catalog references rewritten in this session.
- **Original Batch 11 (aider PageRank port) replaced** by codecontext sidecar approach.
- **Original Batch 12 (codebase indexer w/ Harrier) removed.** No embedding infrastructure.
- **Original Batch 13 (OpenHands event log) replaced** by v1.13 parts table (opencode pattern). Same outcome, different shape.
@@ -239,6 +430,38 @@ BooCode is personal/single-user — license compatibility is non-blocking, but t
- **Aider's `repomap.py` port dropped.** Codecontext supersedes it. Aider contribution narrows to the `.scm` query files only.
- **Globstar role re-scoped.** Not an architect tool — parked for future verify-before-commit gate.
- **codeprysm role re-scoped.** Taxonomy reference only. Embedding half rejected.
- **AI SDK adoption deferred to v1.13.** Hand-roll opencode's repairToolCall pattern in v1.12 first.
- **AI SDK adoption deferred to v1.13.** Hand-roll opencode's repairToolCall pattern in v1.12 first. **Update 2026-05-22:** v1.12 deferred the repairToolCall hand-roll entirely; both AI SDK v6 adoption AND repairToolCall shipped together in v1.13.1-A/v1.13.3 — cleaner outcome than the two-step plan.
- **`tool_choice='required'` confirmed supported** by llama-swap (qwen3.6-35b-a3b-mxfp4, 2026-05-20). Repair tool call is viable.
- **`anomalyco/sst` is a mirror, not a fork.** Same applies to `anomalyco/opencode`. Use canonical `sst/sst` and `sst/opencode` sources.
- **`anomalyco/opencode` confirmed canonical (2026-05-22).** Earlier confusion about whether `sst/opencode` or `anomalyco/opencode` was the active fork is resolved: anomalyco is where active development continues. Use `anomalyco/opencode` for all algorithm lifts.
- **Reviewed 2026-05-22 (cline, kilocode, prompt-tower, auggie, augment-agent, augment-swebench-agent, codecompanion.nvim, junie, cody-public-snapshot, qodo-ai/{agents,qodo-cover,open-aware,qodo-skills}).** Three real lifts emerged:
- **Qodo `agent.toml` schema** (`output_schema`, `exit_expression`, `execution_strategy`) → land in AGENTS.md at v1.14+.
- **qodo-cover record/replay LLM harness** → re-implement (don't vendor — AGPL) as a Vitest fixture plugin at v1.13+.
- **augment-swebench-agent ensembler** → optional v2.0+ verify-gate layer above pending-changes (plandex pattern).
- **qodo-skills PR-resolver state machine** → BooCoder v2.0+ tool.
- **ACP added to v2.x watch list.** Zed's Agent Client Protocol is the analog of MCP for client↔agent. Not in any current batch; revisit only if BooCode wants to expose itself to Zed/Neovim/Termius beyond the web UI. **Reference implementations bracket the protocol:** codecompanion.nvim (#28) is the client side, zed-industries/codex-acp (#31) is the server-side adapter. The codex-acp README confirms ACP's full feature surface (context @-mentions, images, permission-gated tool calls, edit review, TODO lists, slash commands, client MCP servers) matches features BooCode already has internally — adopting ACP would be transport translation, not feature build.
- **kilocode and Cline skipped as code sources** (entry #20). Orchestrator/sub-agent pattern is already covered by cline (#7) and agent-framework (#15).
- **Junie skipped permanently.** No usable source.
- **Cody parked.** Multi-repo context fetcher is the only interesting piece; overkill for single-repo BooCode.
- **prompt-tower skipped.** AGPL VS Code extension; nothing novel that continue's ignore lift + universal XML wrapping doesn't already cover.
- **tiktoken-rs and calloop rejected (2026-05-22).** Both are Rust and Zed-stack-specific. tiktoken-rs additionally fails the model check — Qwen/Gemma/Nemotron don't use OpenAI's BPE encodings, so token counts would be wrong by 1030%. **Source of truth for token counts on llama-swap models is `POST /tokenize` on llama-server**; no client-side tokenizer library needed. Do not re-evaluate either repo.
- **taste-skill accepted as Tier B vendor candidate (2026-05-22).** MIT, SKILL.md format already matches BooCode v1.12 standard, 18k+ stars, framework-agnostic. Two real wins: the 100+ anti-slop ban list (specific font/color/layout failure modes LLMs default to) and the 3-dial parameterization pattern (reusable beyond design). **Gated on a diff against the existing `frontend-design` SKILL** to avoid duplication before vendoring. Real value lands with BooCoder v2.0+ when write tools generate frontend code for Sam's projects (DubDrive, BooLab, Fathom, etc.).
- **costrict skipped, OpenSpec accepted (2026-05-22).** costrict is Apache-2.0 but the top contributors are Roo Code maintainers and the codebase has `.roo/`/`.rooignore`/`.roomodes` — same Cline-lineage VS Code extension shape BooCode rejected at kilocode (#20). The novel surface costrict ships is its **OpenSpec integration**, which is a separate repo. **OpenSpec is the real find:** it formalizes the spec-governed dispatch workflow Sam already uses (per-change folder with proposal/specs/design/tasks artifacts, slash commands per agent, artifact-lifecycle gates). Start by adopting just the directory structure for BooCode's own batch docs (zero-dep documentation reformat); evaluate full CLI adoption later. **Tracked for v1.13.x or v1.14**, not blocking v1.12.0.
- **agents.md noted but not evaluated.** costrict's README acknowledges `agentsmd/agents.md` as a partner. The name and shape strongly suggest it's the canonical source of the AGENTS.md convention BooCode v1.12 already adopted. Worth a future drive-by to confirm, but not blocking anything.
- **zeroshot accepted as Tier B pattern reference (2026-05-22).** MIT, multi-agent orchestration above coding-agent CLIs (Claude Code, Codex, OpenCode, Gemini CLI). **Sits at Paseo's layer, not BooCode's.** Five pattern lifts: complexity-classification conductor, blind-validation invariant (separate agent context verifies — doesn't see worker's history), crash-safe SQLite ledger, three-tier isolation taxonomy (none/worktree/Docker), JSON cluster templates. **The blind-validation invariant is the single most important architectural idea** zeroshot adds — fills the missing piece in plandex/OpenHands/cline patterns where the same agent writes and judges its own work. Lands at BooCode v1.13/v1.14 as an AGENTS.md design principle, then at v2.0+ as a real verify gate above pending-changes. **Separately:** zeroshot is a candidate Paseo-successor if Paseo ever needs replacement; that's a Paseo-scope decision, not BooCode-scope.
- **toprank rejected (2026-05-22).** SEO/SEM domain — wrong category for BooCode. Sam runs developer infrastructure, not marketing sites. Skill format is the same one BooCode v1.12 already uses; no novel pattern.
- **AgentLint accepted as high-value immediate-application reference (2026-05-22).** MIT, 31 evidence-backed repo-quality checks. Most useful catalog entry for *the present moment* — applies directly to every CLAUDE.md/AGENTS.md across Sam's homelab (BooCode, BooLab, HLH, indifferent-broccoli, paseo, etc.) without needing any code lift or version dependency. Specific data points from 265 versions of Anthropic's Claude Code system prompt are immediately actionable: trim emphasis-keyword density, target 60120 line CLAUDE.md sweet spot, SHA-pin Actions, ensure `.env`/`CLAUDE.local.md` are gitignored. **Recommend a single audit pass session against BooCode's instruction files** before any further skill work lands. Optional plugin install for ongoing audits is a v1.12.x post-merge call.
- **awesome-vibe-coding surveyed (2026-05-22).** 60+ tools across 10 sections. **No new catalog entries promoted from the list.** Already-covered items: Cline, Roo Code, Continue, Prompt Tower, Augment, aider, Codex CLI, Gemini CLI. Skipped on category: 18 Web Builders, 4 Editor/IDEs, mobile/desktop builders. **Real leads tracked for next review pass:** `block/goose` (multi-model local agent framework), `eyaltoledano/claude-task-master` (task decomposition algorithm), `ai-christianson/RA.Aid` and the underlying `LangGraph` framework (workflow graphs in production), `automata/aicodeguide` (AI-first methodology). Do not re-evaluate the rejected items.
- **aaif-goose/goose (formerly block/goose) added as Tier B reference (2026-05-22).** Apache-2.0, 45.2k stars, recently moved to Linux Foundation's Agentic AI Foundation. Rust + TypeScript. Native ACP, 15+ providers including Ollama, MCP support for 70+ extensions. **Sits at Paseo's layer, not BooCode's.** Skip code (wrong stack); track as reference for ACP-protocol implementations and the multi-provider abstraction pattern. Ships `.claude/`, `.codex/`, `.cursor/` skill directories — confirms the cross-agent skill-emission pattern noticed in autohand/code-cli (#33).
- **memovai/memov accepted as Tier B reference (2026-05-22).** MIT, Python. Shadow `.mem` timeline tracks prompts + context + plan + file changes at every agent interaction; zero pollution to `.git`. MCP-exposed. `validate_commit` MCP tool detects context drift between prompt and actual changes. **Direct match for BooCode's reviewer-architect pattern.** Lift the MCP-tool shape (`snap`, `mem_history`, `mem_jump`, `validate_commit`) as reference for v1.13+ `view_session_history` feature and v2.0+ verify gate. Don't vendor Python code into Fastify/TS BooCode.
- **bhouston/mycoder rejected (2026-05-22).** MIT, TypeScript, 566 stars, **stale** (last release Mar 2025). Standard CLI coding agent — Claude/OpenAI/Ollama, MCP, parallel sub-agents. Functionally a less-mature opencode. Sam already uses opencode for this role. One UX pattern noted (Ctrl+M mid-stream corrections) but BooCode/opencode/Claude Code all have chat-based interruption. Skip.
- **ai-christianson/RA.Aid accepted as Tier B pattern reference (2026-05-22).** Apache-2.0, Python, 2.2k stars. **Three-stage architecture (Research / Planning / Implementation) on LangGraph** with per-stage model routing (`--research-provider`, `--planner-provider`, `--expert-provider`) + "expert tool" called only when needed for hard reasoning. **Aligns directly with Sam's qwopus27b/qwen3-coder routing.** Lift the three-stage AGENTS.md design and expert-tool escape hatch at v1.14+; don't lift LangGraph (wrong stack); never enable `--cowboy-mode` equivalent (opposite of BooCode's no-autonomous-commit rule).
- **Kirill89/reviewcerberus rejected as code, CoV logged as pattern (2026-05-22).** Closed-source Docker distribution (license not in registry). Multi-provider (Bedrock/Anthropic/Ollama/Moonshot), accepts `guidelines.md`, **Chain-of-Verification mode** to reduce false positives. CoV is the only takeaway — per-finding verification primitive, complementary to zeroshot's blind-validation (per-workflow #37) and bernstein's lineage chains (per-artifact #49). Stackable.
- **autohandai/code-cli rejected (2026-05-22).** 56 stars, COMMERCIAL.md present (commercial license restriction likely). Standard ReAct CLI agent with no novel pattern vs opencode. Cross-agent skill emission (copies skills between `~/.claude/skills/`, `~/.codex/skills/`, `~/.autohand/skills/`) is the only interesting bit — same pattern goose (#41) does. Skip.
- **Roo Code Boomerang Tasks accepted as Tier B pattern reference (2026-05-22, Sam-flagged).** Roo Code itself rejected (already covered via #20 kilocode and #35 costrict — VS Code/Cline lineage). Three architectural patterns lifted: **(1) Orchestrator with intentional capability restriction** — cannot read/write/MCP/shell, only delegates, preventing context poisoning. **(2) Down-pass/up-pass context discipline** — no implicit inheritance, parent passes context down via `new_task` message, subtask passes summary up via `attempt_completion` result only. **(3) Explicit precedence override clause** baked into subtask prompts. Together these sharpen zeroshot's blind-validation (#37) into a both-directions principle. Lands at v1.13/v1.14 as AGENTS.md design, v2.0+ as real delegation mechanics.
- **eyaltoledano/claude-task-master pattern accepted, code rejected (2026-05-22).** **MIT + Commons Clause** makes BooCode (self-hosted developer chat) a competing product — no code vendoring. 25.7k stars, JS/TS. Three patterns worth lifting independently in BooCode's own MIT code: **(1) Tiered tool-loading via env var** (`TASK_MASTER_TOOLS=core|standard|all|custom`, 7/15/36 tools, ~5k/10k/21k tokens) — direct fit for `BOOCODE_TOOLS` at v1.12.x or v1.13. **(2) Three model roles** (main/research/fallback) — same pattern as RA.Aid (#44), complementary evidence. **(3) PRD-as-source-of-truth** at `.taskmaster/docs/prd.txt` formalizes Sam's spec-governed work convention.
- **Dominic789654/agent-hub tracked, not lifted (2026-05-22).** Apache-2.0, Python 100% stdlib-only (no FastAPI/SQLAlchemy/Pydantic — zero supply chain surface), 1 star, v0.1.0 March 2026. Local-first multitask board for routing/observing code-assistant work across repos. SQLite queueing, dependency-aware dispatch, **human inbox**, dashboard at `/app`. **Architecturally what Paseo wants to grow into.** Too early to vendor; track for next pass. The stdlib-only constraint is a useful lens to evaluate BooCode/BooLab dependency footprint.
- **sipyourdrink-ltd/bernstein tracked as compliance-grade reference (2026-05-22).** License needs verification before any lift (`/LICENSE` should be checked directly). 262 stars, Python. Same layer as zeroshot (#37) and agent-hub (#48), but with **audit-grade compliance** as differentiator: HMAC-chained audit log, signed agent cards (Ed25519/EdDSA + JCS), per-artifact lineage (producer + inputs + prompt SHA + model + cost), customer-key signing for DORA/NIS2/EU AI Act Article 12, air-gap deploy, deterministic scheduler, one git worktree per agent, cost-aware routing bandit. **Over-spec for Sam's current homelab work** but the right shape if BooCode ever needs to produce audit evidence. The **HMAC-chained audit log** is a small lift-friendly pattern even today.
- **vorn-run/vorn rejected as code, pattern noted (2026-05-22).** MIT, Electron + TypeScript, 24 stars, alpha. Multi-agent grid UI for Claude Code/Copilot/Codex/OpenCode/Gemini. Each agent in its own PTY. Task queue + kanban + workflow automation + headless execution + inline diff review with structured-feedback-back-to-agent + worktree isolation + MCP server. **Wrong stack** (Electron desktop UI vs BooCode's Fastify/TS+React SPA). Pattern note: **PTY-per-agent + worktree-per-task + inline-diff-feedback-loop** is the canonical shape for multi-agent orchestration above real CLI agents; same architectural choice Paseo made.
- **siropkin/budi accepted as tooling, not catalog entry (2026-05-22).** MIT, Rust, single 6MB binary, sub-millisecond hook latency. **WakaTime for Claude Code** — tracks tokens, costs, prompts, file activity, sub-agent spawns in local SQLite, dashboard at `localhost:7878/dashboard`. **Recommend immediate install** (`budi init --global`) for Claude Code session observability. The **5-hook Claude Code event taxonomy** (`SessionStart`, `UserPromptSubmit`, `PostToolUse`, `SubagentStart`, `Stop`) is the canonical reference and worth knowing when BooCode v2.0+ designs its own hook system.
- **GeiserX/LynxPrompt tracked as architectural reference, code off-limits (2026-05-22).** **GPL-3.0 makes vendoring incompatible with BooCode's MIT licensing.** 27 stars, Next.js + PostgreSQL + Prisma. Self-hostable platform for managing AGENTS.md / CLAUDE.md / .cursor/rules / slash commands across **30+ AI assistant formats**. Single blueprint, export to N formats. Federated marketplace. The concept fits Sam's situation (5+ project CLAUDE.md/AGENTS.md files maintained separately) but the **manual AgentLint (#39) audit pass is the right ROI today** rather than adopting a full platform. If consolidation ever needed, reimplement the format-adapter pattern in MIT-licensed BooCode code, don't vendor.
- **ShipWithAI/claude-code-mastery noted as docs reference (2026-05-22).** **CC BY-NC-SA 4.0** content + MIT code examples. 9 stars. Free 16-phase / 55-module / 136-lesson course on Claude Code workflows. **Two structural patterns worth borrowing:** (1) **7-block module structure** (WHY → CONCEPT → DEMO → PRACTICE → CHEAT SHEET → PITFALLS → REAL CASE) as a docs template; (2) **phase list as coverage checklist** to diff against Sam's own CLAUDE.md/AGENTS.md files — combine with AgentLint (#39) for a single audit pass. Don't redistribute content (NC license).

View File

@@ -1,127 +1,205 @@
# BooCode v1.x — Roadmap
Last updated: 2026-05-21
Last updated: 2026-05-22
> **Companion doc:** `boocode_code_review.md` holds the full external-repo inventory, lift rationale, and license analysis. This document is the canonical source for shipping state, version ordering, and what's planned vs. shipped.
## Overview
BooCode is a standalone code-chat tool at `/opt/boocode/`. Read-only by design — pick a project, chat with a local LLM that has file-inspection tools, get streaming responses over WebSocket.
BooCode is a **3-app monorepo** at `/opt/boocode/` (locked 2026-05-22):
Live at `https://code.indifferentketchup.com` (Caddy → Authelia → Tailscale → `100.114.205.53:9500`).
- **BooChat** (`apps/chat`, port `9500`, `code.indifferentketchup.com`) — read-only chat with file-inspection tools. The live thing. Pick a project, chat with a local LLM, get streaming responses over WebSocket. Will rename `boocode_db``boochat_db` when BooCoder lands.
- **BooCoder** (`apps/coder`, port `9502`, `coder.indifferentketchup.com`) — write tools + external-CLI dispatch. **Planned, v2.0.** Both an in-process inference loop (with `pending_changes` table) AND ACP-dispatched external agents (opencode/goose) with PTY fallback (claude/pi/smallcode) — same surface, two execution paths.
- **BooTerm** (`apps/booterm`, port `9501`) — PTY/tmux/xterm.js. **Live since May 2026.** Node 20 Alpine + node-pty + tmux + xterm.js. Tmux session per pane (`bc-<uuid>`), SSH-out works (openssh-client + gosu in the image). `/api/term/health` shares the existing `boocode_db`.
Caddy → Authelia → Tailscale → `100.114.205.53` → 9500/9501/9502. Three apps, **one shared Postgres** (`boocode_db``boochat_db`).
**Architectural commitments:**
- No embeddings. Model uses file-view tools (`view_file`, `list_dir`, `grep`, `find_files`) + sidecar analyzers (codecontext, codesight) + codecontext MCP tools. Walked away from the RAG pipeline May 2026.
- Read-only in v1.x. Write tools land in BooCoder (separate container, post-v1.x).
- One Postgres (`boocode_db`), one frontend SPA, container-per-service for new capabilities.
- **No embeddings.** Model uses file-view tools (`view_file`, `list_dir`, `grep`, `find_files`) + sidecar analyzers (codecontext, future codesight) + codecontext MCP tools. Walked away from the RAG pipeline May 2026.
- **BooChat is read-only** through v1.x. Write tools land in BooCoder at v2.0.
- **Mount strategy: blanket `/opt:rw`, permission gating at the write-tool layer.** Per-project scoping is policy, not mount. Path-guard correctness is the #1 test target for v2.0.
- **External CLI agents (`opencode`/`claude`/`goose`/`pi`) live on the host, not in containers.** BooCoder shells out via local-exec PTY or ACP subprocess. Host install inherits Sam's existing `~/.opencode/`, `~/.claude/`, `~/.config/goose/` configs.
- **Protocol roles locked (2026-05-22):** **BooChat = MCP client only** (read-only tool consumer, never enables write-capable MCP servers). **BooCoder = MCP client + MCP server + ACP client (host) + ACP agent (driveable)** — full matrix. BooCoder's ACP-client role replaces raw-PTY dispatch for ACP-capable agents (opencode `opencode acp`, goose `goose acp`); PTY fallback retained for claude/pi/smallcode.
- **Strategic target: Paseo-equivalent dispatcher inside BooCode** (2026-05-22 pivot). Paseo (`getpaseo/paseo`) is AGPL-3.0 — incompatible with BooCode's MIT license and network-served deployment. Reproduce the architecture using only license-clean patterns. Primary architectural template: `Dominic789654/agent-hub` (Apache-2.0). Critical context-management primitive: Roo Code Boomerang Tasks pattern. Observation pattern: Claude Code hooks (siropkin/budi reference).
External code lifted from / referenced in: see `boocode_code_review.md` for full inventory.
-----
## Shipped (status as of 2026-05-21)
## Shipped (status as of 2026-05-22)
| Version | Theme | Tag |
|---|---|---|
| v1.0 | Initial scaffold | — |
| Batches 14.4 | Markdown, sidebar, panes, chats-inside-sessions, archive, fork/delete, header polish, settings drawer | — |
| v1.5 | resolveProjectPath, BOOTSTRAP_ROOT, vitest pin | — |
| v1.6, v1.6.1, v1.6.2 | Mobile pass + RightRail mobile drawer | — |
| v1.7 | Drag-drop file + paste-as-attachment | — |
| v1.8, v1.8.1, v1.8.2 | Settings drawer, git_status tool, WS reconnect, per-turn budget reset + Continue affordance + CapHitSentinel | — |
| v1.9.1 | Skills system (`/opt/skills/` + `skill_find` / `skill_use` / `skill_resource` + `/skill` slash command) | `v1.9.1` |
| v1.9.7 | `ask_user_input` elicitation tool | `v1.9.7` |
| Batch 9 (Agents Tier 2) | `AGENTS.md` + 6 builtin agents + AgentPicker in ChatInput toolbar + `sessions.agent_id` | folded into `v1.9.1`/`v1.9.7` |
| v1.10.0 | BooTerm: separate container, xterm.js + node-pty + tmux | `v1.10.0` |
| v1.10.1 | BooTerm-user (spawn as samkintop, login bash, Claude Code/opencode PATH) | `v1.10.1` |
| v1.10.4, v1.10.5 | Mobile terminal + XML tool-call fallback parser | — |
| v1.11.0 | opencode-style compaction port (auto-overflow, anchored summary, tail preservation) | — |
| v1.11.1 | Compaction follow-up (working indicator during compaction, unit tests, .bak cleanup) | — |
| v1.11.2 | ContextBar (persistent context-usage indicator above MessageList) | — |
| v1.11.3 | `ctx_max` capture via `/upstream/<model>/props` (replaces dead `timings.n_ctx` read) | `v1.11.3` |
| v1.11.5 | ContextBar inline next to agent picker; remove ChatContextPopover; default new sessions to no agent | — |
| v1.11.6 | Doom-loop guard from opencode (3 identical tool calls → sentinel, abort recursion) | — |
| v1.11.7 | pathGuard secrets filter (continue.dev `DEFAULT_SECURITY_IGNORE_FILETYPES`) | — |
| v1.11.8 | web_search + web_fetch tools via SearXNG | — |
| v1.11.9 | Manual redirect handling — re-run URL guard on each hop (SSRF hardening) | — |
| v1.11.10 | Stream-cap response body at 5MB, abort on overflow | `v1.11.x` |
| **v1.12.0** | **codecontext sidecar (Go HTTP shim, NDJSON MCP framing, child.Wait supervisor) + container guidance (BOOCHAT.md/BOOCODER.md) + 7 vendored skills + system-prompt.ts extraction + mtime-watch cache + 8 codecontext tool wrappers + per-agent tool whitelists + .codecontextignore template + agents.ts ALL_TOOL_NAMES single-source-of-truth fix** | `v1.12.0` |
|Version |Theme |Tag |
|-----------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------|
|v1.0 |Initial scaffold |— |
|Batches 14.4 |Markdown, sidebar, panes, chats-inside-sessions, archive, fork/delete, header polish, settings drawer |— |
|v1.5 |resolveProjectPath, BOOTSTRAP_ROOT, vitest pin |— |
|v1.6, v1.6.1, v1.6.2 |Mobile pass + RightRail mobile drawer |— |
|v1.7 |Drag-drop file + paste-as-attachment |— |
|v1.8, v1.8.1, v1.8.2 |Settings drawer, git_status tool, WS reconnect, per-turn budget reset + Continue affordance + CapHitSentinel |— |
|v1.9.1 |Skills system (`/opt/skills/` + `skill_find` / `skill_use` / `skill_resource` + `/skill` slash command) |`v1.9.1` |
|v1.9.7 |`ask_user_input` elicitation tool |`v1.9.7` |
|Batch 9 (Agents Tier 2)|`AGENTS.md` + 6 builtin agents + AgentPicker in ChatInput toolbar + `sessions.agent_id` |folded into `v1.9.1`/`v1.9.7`|
|v1.10.0 |BooTerm: separate container, xterm.js + node-pty + tmux |`v1.10.0` |
|v1.10.1 |BooTerm-user (spawn as samkintop, login bash, Claude Code/opencode PATH) |`v1.10.1` |
|v1.10.4, v1.10.5 |Mobile terminal + XML tool-call fallback parser |— |
|v1.11.0 |opencode-style compaction port (auto-overflow, anchored summary, tail preservation) |— |
|v1.11.1 |Compaction follow-up (working indicator during compaction, unit tests, .bak cleanup) |— |
|v1.11.2 |ContextBar (persistent context-usage indicator above MessageList) |— |
|v1.11.3 |`ctx_max` capture via `/upstream/<model>/props` (replaces dead `timings.n_ctx` read) |`v1.11.3` |
|v1.11.5 |ContextBar inline next to agent picker; remove ChatContextPopover; default new sessions to no agent |— |
|v1.11.6 |Doom-loop guard from opencode (3 identical tool calls → sentinel, abort recursion) |— |
|v1.11.7 |pathGuard secrets filter (continue.dev `DEFAULT_SECURITY_IGNORE_FILETYPES`) |— |
|v1.11.8 |web_search + web_fetch tools via SearXNG |— |
|v1.11.9 |Manual redirect handling — re-run URL guard on each hop (SSRF hardening) |— |
|v1.11.10 |Stream-cap response body at 5MB, abort on overflow |`v1.11.x` |
|v1.12.0 |codecontext sidecar (Go HTTP shim, NDJSON MCP framing, child.Wait supervisor) + container guidance (BOOCHAT.md/BOOCODER.md) + 7 vendored skills + system-prompt.ts extraction + mtime-watch cache + 8 codecontext tool wrappers + per-agent tool whitelists + .codecontextignore template + agents.ts ALL_TOOL_NAMES single-source-of-truth fix |`v1.12.0` |
|v1.12.1 |Server-side workspace pane sync (`sessions.workspace_panes jsonb`) + 5-state status indicator overhaul (streaming/tool_running/waiting_for_input/idle/error) + startup hung-row sweep + stale `messages_status_check` constraint dropped + `detectSameNameLoop` reverted (dead code) + stop-handler writes `cancelled` status |`v1.12.1` |
|v1.12.2 |Live tok/s + ctx_used display next to status indicator while streaming (frontend-only) |`v1.12.2` |
|v1.12.3 |Stale-stream banner — "Previous response didn't complete. [Retry] [Discard]" when streaming row > ~60s with no new tokens. `POST /api/chats/:id/discard_stale` backend endpoint |`v1.12.3` |
|v1.12.4 |Refactor only — `inference.ts` (1700 LoC) split into `inference/` directory: `turn.ts`, `stream-phase.ts`, `tool-phase.ts`, `error-handler.ts`, `sentinel-summaries.ts`, `payload.ts`, `xml-parser.ts`, `sentinels.ts`, `budget.ts`, `types.ts`, `index.ts`. Shipped as rc1/rc2/rc3 → final. No behavior change. Lined up `stream-phase.ts` as the swap target for v1.13 AI SDK migration |`v1.12.4` |
|**v1.13.0** |**`message_parts` table** `(id, message_id, sequence, kind, payload jsonb, created_at)` with kinds `text/tool_call/tool_result/reasoning/step_start`. CHECK constraint, `(message_id, sequence)` unique + index. Dual-write at every site that wrote `tool_calls`/`tool_results` JSON (stream-phase finalize, skills × 2, messages.ts answer flow, chats.ts × 2). `ToolDef<T>` gained `category: 'read_only' | 'write'`. v1.x registry rejects write. Old JSON columns remain authoritative for reads. Strangler-fig phase 1 |`v1.13.0` |
|**v1.13.1-A** |**AI SDK v6 install + streamCompletion adapter.** `ai@^6`, `@ai-sdk/openai-compatible@^2`. `provider.ts` wraps `createOpenAICompatible` against `config.LLAMA_SWAP_URL`. `streamCompletion` rewritten as adapter over `streamText`. XML fallback parser preserved for qwen3.6's inline `<tool_call>` emissions. **Patched mid-flight:** AI SDK v6 swallows abort signals silently — explicit `if (signal?.aborted) throw` after stream drain. Without it, stop button writes `complete` instead of `cancelled`. reasoning-delta counted + dropped (re-captured in -C). Known regression flagged: live mid-stream tps gone (single trailing publish; TODO for delta-cadence interpolation against `result.usage`) |(umbrella tag) |
|**v1.13.1-B** |**`messages_with_parts` view** with COALESCE fallbacks against legacy JSON columns. Read sites switched: `chats.ts:427`, `messages.ts:95`, `ws.ts:27`, `payload.ts`, `compaction.ts`. Perf verified at 1ms for 42-message chat. `reasoning_parts` column added to the view (consumed in -C). API contract preserved. Parts become source of truth at read; JSON columns kept by dual-write only |(umbrella tag) |
|**v1.13.1-C** |**`ask_user_input` correlation ported to parts.** `messages.ts:478/549` now JOINs `message_parts` on `payload->>'id'` and `payload->>'tool_call_id'`. Downstream call sites updated to `{message_id, payload}` shape. 404 fallback for pre-v1.13.0 history (acceptable scope). **Reasoning end-to-end:** `reasoning-delta` accumulated in `stream-phase.ts` adapter via `StreamResult.reasoning` (simpler than the brief's StreamPhaseState approach); `partsFromAssistantMessage` accepts optional `reasoning`, emits at seq 0; `finalizeCompletion` + `executeToolPhase` dual-write reasoning parts; `payload.ts` reads `reasoning_parts` from view, collapses into `OpenAiMessage.reasoning`; `toModelMessages` emits AI SDK `ReasoningPart` in assistant content array. Smoke: 361 chars reasoning at seq 0, 429 chars text at seq 1 |`v1.13.1` (`ac1a71f`) |
|**v1.13.3** |**Cleanup bundle, 4 independent items.** (1) `ALTER DATABASE boocode SET statement_timeout = '30s'` — caps damage from query-plan regression on the view's nested subselects; documented in `schema.sql` since `ALTER DATABASE` can't run inside a DO block. (2) Alpha-sorted tool registry — `.sort((a, b) => a.name.localeCompare(b.name))` at `ALL_TOOLS` export; llama.cpp prompt cache hits on byte-identical prefixes, tool-order drift killed hit rate every turn. (3) Periodic 60s in-process sweeper marks `streaming` rows older than 5 min as `failed` and publishes `chat_status='idle'` so the UI dot drops — closes mid-session crash UX gap that the startup sweep (v1.12.1) only handled at boot. (4) `experimental_repairToolCall` wired through AI SDK v6 `streamText` — routes malformed tool calls to a logged passthrough instead of crashing the stream. Owed since v1.13.1-A. 173/173 tests pass (+1 alpha-ordering test)|`v1.13.3` (`a08d809`) |
|**v1.13.4** |**Two-tier compaction prune.** `services/inference/prune.ts` with pure `selectPruneTargets` decision helper. Tier 1 hides stale `tool_result` parts via `message_parts.hidden_at` at the 20k-freed threshold (cheap, no inference call); tier 2 falls back to anchored summarize when prune alone isn't enough. Schema additions: `message_parts.hidden_at` column + partial index `ON (message_id) WHERE hidden_at IS NULL`. `messages_with_parts` view filters hidden parts so payload assembly never sees them. Avoids burning an inference round on every overflow. opencode-pattern half-shipped in v1.11.0 — this closes it. |`v1.13.4` (`ec8593c`) |
|**v1.13.5** |**opencode `truncate.ts` port — full tool output retrievable via opaque id.** New `services/truncate.ts` with `tr_<12 base32>` ids on tmpfs (`/tmp/boocode-truncations`, 0o700, 5MB cap matching `view_file`'s `MAX_FILE_BYTES`, 7-day TTL). Three exports: `storeTruncation`, `readTruncation`, `truncateIfNeeded` (wrap-or-passthrough helper). New `view_truncated_output(id)` tool retrieves the full content; model never sees the truncation dir (resolved server-side). Wired through 5 of 7 tool sites: `view_file`, `list_dir`, `web_fetch`, `codecontext_client`, plus alpha-sorted into `ALL_TOOLS` (count 19→20). `cleanupTruncations` piggybacks on the v1.13.3 60s sweeper (TTL pass + orphan reap via parts query on `payload->'output'->>'outputPath'`). grep and find_files deferred (need file_ops refactor to expose uncapped output). 186 tests (was 179, +7 in truncate.test.ts). |`v1.13.5` (`f8fc5db`) |
|**v1.13.6** |**Compaction head-assembly audit + reasoning fix.** Audit traced compaction's summary path post-v1.13.1-B read flip across three quadrants — Q1 view read (clean), Q2 parts shape (clean), Q3 reasoning render (FIX NEEDED). v1.13.1-C wired reasoning end-to-end into `inference/payload.ts` but missed the compaction read site, silently degrading summary quality for reasoning-channel models (qwen3.6) since -C shipped. Fix: `CompactionMessage` extended with `reasoning_parts` field; SELECT pulls `reasoning_parts` from `messages_with_parts`; `buildHeadPayload` (now exported for tests) prefixes assistant content with `<reasoning>...</reasoning>\n\n<content>` when reasoning is present; standalone `<reasoning>` tag for tool-call-only turns; omits tag when reasoning is null or empty. 4 new render-branch tests (190 total). |`v1.13.6` (`81d837c`) |
|**v1.13.7** (uncommitted)|**Stability bundle, 5 fixes from production observability gap.** (1) `provider.ts``includeUsage: true` on `createOpenAICompatible`. `@ai-sdk/openai-compatible` defaults this false, omitting `stream_options.include_usage` from request body; llama-swap never emitted the usage block, so `result.usage.inputTokens/outputTokens` resolved `undefined` and `tokens_used`/`ctx_used` landed NULL in **every** assistant row since v1.13.1-A. Surfaces tokens in StatsLine + persisted DB rows going forward (no backfill). (2) `MessageList.tsx:48``hasText = m.content.trim().length > 0`. AI SDK v6 streaming occasionally emits a leading `\n` text-delta on tool-call-only turns; the literal newline passed `length > 0` and rendered an empty bubble + ActionRow between each tool call. (3) `MessageBubble.tsx:654` — same trim on `hasContent` (defensive, no-tool-calls path). (4) `payload.ts:64``buildMessagesPayload` skips assistant rows with `status='failed'` AND `status='complete' && empty content && no tool_calls`. Without this, a trailing empty/failed assistant + the next attempt's placeholder produced "Cannot have 2 or more assistant messages at the end of the list" rejections from the upstream API. (5) `budget.ts:11``BUDGET_NO_AGENT = 30` (was 15). No-agent mode shares the read-only-agent toolset at runtime; the cautious 15-cap was forward-looking for write tools that haven't landed. 190/190 tests still pass.|— |
**v1.13.2 deliberately deferred** — keep the dual-write through v1.13.4v1.13.11 as rollback insurance. Drop legacy columns last.
-----
## In flight (uncommitted on disk, 2026-05-21)
### Shipped (v1.13.x — written 2026-05-22, retagged same day)
v1.12.1 work — landed today, not yet committed:
All v1.13.x batches were retagged to the `vMAJOR.MINOR.PATCH-slug` scheme on 2026-05-22. `CHANGELOG.md` is the canonical per-tag record (slug describes what shipped; tag name alone recalls the batch). Tip is `v1.13.14-skills-audit` (`0fa46cd`); the next batch is `v1.13.15-codecontext-synth` (this batch, tag pending). Tags in chronological order:
| Item | Status | Notes |
|---|---|---|
| Server-side workspace pane sync | Done | `sessions.workspace_panes jsonb` column; PATCH endpoint; `session_workspace_updated` WS frame; localStorage migration on first load; deprecated `session_panes` table dropped |
| Richer status indicators | Done | Five states (`streaming` / `tool_running` / `waiting_for_input` / `idle` / `error`) with distinct visuals: amber orbiting dots for streaming, amber spinning ring for tool execution, blue static for waiting on user, emerald/gray/red for idle/error |
| Startup hung-row sweep | Done | `UPDATE messages SET status='failed' WHERE status='streaming' AND created_at < NOW() - INTERVAL '5 minutes'` on server boot |
| One stuck row from v1.12.0 smoke | Cleared | Manual UPDATE (`d63c25b1`) |
| `detectSameNameLoop` code path | Added, never fired | Candidate for revert in next batch — dead code |
| Diagnostic logging in inference.ts | Added for debugging | Must come out before commit |
- `v1.13.0-ai-sdk-v6` — AI SDK v6 migration; `streamCompletion` adapter; `messages_with_parts` view; reasoning_parts end-to-end
- `v1.13.1-cleanup-bundle``statement_timeout='30s'`, alpha-sorted tool registry, 60s stuck-row sweeper, `experimental_repairToolCall` pass-through
- `v1.13.2-compaction-prune` — two-tier prune; `message_parts.hidden_at` column + partial index; `messages_with_parts` view CASE refinement
- `v1.13.3-truncate` — opencode `truncate.ts` port; opaque `tr_<…>` id, `view_truncated_output(id)` tool, tmpfs storage
- `v1.13.4-reasoning-fix``<reasoning>` prose-prefix in compaction head-assembly for tool-bearing turns
- `v1.13.5-stability-bundle``includeUsage: true` on provider, `hasText` trim guard, `BUDGET_NO_AGENT` 15→30, trailing-empty-assistant filter
- `v1.13.6-prefix-stability``buildSystemPromptWithFingerprint` SHA-256 + per-session drift observer
- `v1.13.7-compaction-trigger` — overflow trigger lowered to `floor(0.85 × ctx_max)`
- `v1.13.8-tool-cost``tool_cost_stats` SQL view + per-tool rolling 100-call mean in AgentPicker
- `v1.13.9-agentlint` — instruction-file AgentLint pass; identity-openers removed; `CLAUDE.local.md` to .gitignore
- `v1.13.10-openspec``openspec/changes/<slug>/{proposal,tasks,design}.md` shape; archived batch docs preserved via `git mv`
- `v1.13.11-tools` — tiered tool loading via `BOOCODE_TOOLS` env (`core | standard | all`)
- `v1.13.12-ws-schemas` — Zod schemas for all 27 wire-format frames; `publishFrame` / `publishUserFrame` wrappers; parity test
- `v1.13.13-ws-publish` — all ~80 publish sites converted to the typed wrappers; every WS frame now Zod-validated at boundary
- `v1.13.14-skills-audit` — 26 skills vendored + audited via 5 parallel agent teams; 14 kept, 11 dropped, 1 migrated to BOOCHAT.md/BOOCODER.md
- `v1.13.15-codecontext-synth` — forced second-inference synthesis pass for codecontext overview tools (truncation-aware extraction; auto-fetched top-N files + project docs; 32k payload-budget contract preserved)
- `v1.13.16-xml-parser` — Anthropic `<invoke>` parser support + Levenshtein-based unknown-tool recovery hints (qwen3.6 drift to Claude Code-style tool names like `read_file`); xml-parser test coverage
-----
The remaining strangler-fig final step (drop `messages.tool_calls` + `tool_results` columns) is still pending under its old `v1.13.2` working name; will get a new tag slug when scoped.
## v1.12.x cleanup (NEXT — small, immediate)
## In flight / next (v1.13.x cleanup line)
Five items. Group them or split them — your call.
Five more single-dispatch batches before the strangler-fig closes. Each ships independently with its own smoke and rollback surface. **Do not fold.** Order is locked:
### v1.12.1 — commit consolidation
### v1.13.8 — system-prompt prefix stability verify-and-measure (REFRAMED, 2026-05-22)
**Action items, in order:**
**Original plan:** add a `system_prompt_cache` DB table keyed by `(agent_id, project_id, skills_version)`, mtime-invalidated.
1. **Remove diagnostic logging** from `apps/server/src/services/inference.ts`. The 12 `ctx.log.info` calls added today proved the inference loop was functioning correctly; the prompts were just slow. Verbose for production. Strip them, keep the file clean.
**Why reframed:** recon disproved the premise. `apps/server/src/services/system-prompt.ts:buildSystemPrompt` already runs over mtime-cached inputs at the file layer:
2. **Revert `detectSameNameLoop`.** Three additions in inference.ts:
- `DOOM_LOOP_SAME_NAME_THRESHOLD = 5` constant
- `detectSameNameLoop()` function
- Call site in `runAssistantTurn` immediately after the existing `detectDoomLoop` check
Never fired in any real run today. Dead code. The existing `detectDoomLoop` (identical args, threshold 3) is sufficient.
- BOOCHAT.md / BOOCODER.md cached in `system-prompt.ts:25` (`cachedGuidance`, keyed by mtime)
- global + per-project AGENTS.md cached in `agents.ts:245` (`safeStat` pattern, 60s TTL)
- `session.system_prompt` / `project.default_system_prompt` are DB scalars (byte-stable until edited)
- BASE_SYSTEM_PROMPT is a hardcoded template with `${projectPath}` interpolation
3. **Drop the stale `messages_status_check` CHECK constraint** in `apps/server/src/schema.sql`. Two constraints exist on the table:
- `messages_status_check` allows `streaming|complete|failed` (old, stale)
- `messages_status_chk` allows `streaming|complete|failed|cancelled` (new)
The old one prevents `cancelled` from being written. Drop it with `ALTER TABLE messages DROP CONSTRAINT IF EXISTS messages_status_check;`.
Output assembly is a microsecond pure-string concat with no I/O. Skills aren't in the prefix (runtime discovery via `skill_find`). Tools live in a separate request body field, alpha-sorted by v1.13.3. **In theory the prefix is already byte-stable across turns; nothing has measured it.**
4. **Stop-handler writes terminal status.** When user clicks stop mid-stream, the abort path must `UPDATE messages SET status='cancelled' WHERE id = $assistantMessageId AND status='streaming'`. Currently rows just sit `streaming` forever. The startup sweep catches them on restart, but they should be written immediately. Edit `apps/server/src/services/inference.ts` `handleAbortOrError` to add the UPDATE.
**New scope — instrumentation only, no cache:**
5. **Commit + tag v1.12.1.** Include the workspace pane sync, status indicator overhaul, startup sweep, and items 14 above. Single commit per item is fine; tag at end.
1. SHA-256 fingerprint of `buildSystemPrompt`'s output logged per turn at `level=info`, msg `prefix-fingerprint`, with project_id / agent_id / session_id / prefix_hash / prefix_length / mtime fields.
2. Module-level `Map<sessionId, lastHash>` observer. On hash change for a known session → emit `prefix-drift` at `level=warn` with `prev_hash`, `new_hash`, and a field-level `changed_inputs` diff.
3. Unit-level byte-stability assertion in `system-prompt.test.ts`: two consecutive `buildSystemPrompt` calls with the same inputs return byte-identical strings.
**Estimated:** ~150 LoC net (deletions dominate).
**Decision criterion:** smoke 5 turns in a fresh session. 5 identical hashes + zero drift logs → close v1.13.8 as no-op, **drop the DB cache plan permanently**, move to v1.13.9. If drift surfaces → characterize the failure mode in a follow-up batch (the answer may not be a cache at all).
### v1.12.2 — live throughput display (small UX win)
**Doctrine:** matches the v1.13.6 audit pattern. Don't add infrastructure without a proven cache miss. The v1.12.0 mtime caches at the input layer plus alpha tool ordering at the request body layer already address the load-bearing cache-stability surfaces.
Surface `tokens_per_second` and `ctx_used` next to the status indicator while streaming. Backend already emits these in the `usage` frame; just consume them in the StatusDot wrapper or a sibling component. ~80 LoC, frontend-only.
**Dispatch brief:** `handoff_v1.13.8_prefix_verify.md`.
### v1.12.3 — stale-stream frontend banner
**Estimated:** ~95 LoC (system-prompt.ts + small `getAgentsMtimes` accessor in agents.ts + 3 new tests).
When a chat has a `streaming` row older than ~60s with no new tokens, the UI should surface a "Previous response didn't complete. [Retry] [Discard]" banner instead of silently queueing new sends. Today's debugging spent four hours misreading slow streams as dead; this is the UX fix that prevents that. ~150 LoC, frontend + small backend endpoint for the discard action.
### v1.13.9 — compaction overflow trigger formula
-----
opencode pattern: `0.85 * ctx_max` early trigger (not at 100% saturation). Reduces tail-loss risk and gives compaction a safer window. Tiny change but tied to v1.13.4's tier logic — sequence matters.
## v1.13 — Phase B: parts table + AI SDK + per-tool tagging
**Lift source:** `anomalyco/opencode` `session/overflow.ts`.
**Goal:** typed message parts replace JSON blobs on `messages.tool_calls` / `tool_results`. Adopt Vercel AI SDK `streamText`. Tag tools as `read_only` or `write` at definition time.
**Estimated:** ~30 LoC.
### v1.13.10 — per-tool token cost accounting
Rolling average per tool, surfaced in AgentPicker tooltip + agent-pick decisions. Backend tracks `(tool_name, prompt_tokens_in, completion_tokens_out)` per call; surfaces a 100-call rolling mean. Frontend reads it for tool-cost hints. **Depends on v1.13.7's `includeUsage` fix** — without real token numbers in DB rows, the rolling average is empty.
**Estimated:** ~250 LoC.
### v1.13.11 — WebSocket frame typing
Zod schemas validated both ends. Catches the recurring class of bug that drove the 2026-05-21 debugging spike (silent protocol drift). Upfront work that pays back every time the protocol changes. `chat_status`, `usage`, `parts_appended`, `session_workspace_updated`, `tool_running` — every frame gets a Zod schema, every send/receive site validates.
**Estimated:** ~300 LoC.
### v1.13.12 — skills audit pass (NEW, 2026-05-22)
**Goal:** apply the rules→recipes split (per Codeminer42 activation-gap data: plain skills invoke 6% in clean multi-turn, `CLAUDE.md`/`AGENTS.md` is 100% present) to BooCode's 7 vendored v1.12 skills. Sort each into: (a) move to `AGENTS.md` as always-true rule, (b) keep as recipe invoked via `/skill <name>`, (c) move bulky context into `references/` flat subdirectory inside the skill, (d) delete (Claude already does it reliably).
**Scope:**
1. Schema: new `message_parts` table (`id, message_id, kind, payload JSONB, sequence`). Kinds: `text`, `tool_call`, `tool_result`, `reasoning`, `step_start`. The `messages` table becomes header-only.
2. Inference loop rewritten on AI SDK `streamText`. `streamCompletion` becomes a thin wrapper. Native AI SDK `experimental_repairToolCall` replaces v1.12's hand-rolled version.
3. Tool registry: `ToolDef<T>` gains `category: 'read_only' | 'write'` field. BooCode v1.x rejects any `write` tool at registry time (defense in depth for the BooCoder split). Alpha-sort tool list before sending to model (prompt-cache stability).
4. Reasoning content (`reasoning_content` from Qwen3.6) captured as its own part type instead of dropped or inlined.
1. **Audit each of the 7 vendored skills against the 4-way split.** Most workflow-rule content ("always do X before Y", "never do Z") moves to `AGENTS.md` since it should be 100% present. Recipe content ("here's how to scaffold a component", "here's the release checklist") stays as skill, gets `context: fork` if heavy.
1. **Adopt Anthropic best-practices conventions** for any skills that remain after audit: gerund names (`scaffolding-components`, not `component-helper`), SKILL.md ≤500 lines, references one level deep, third-person imperative voice, MCP tool references in `ServerName:tool_name` format, no Windows-style paths, no time-sensitive info, consistent terminology, no "voodoo constants."
1. **Run each remaining skill through the 4-step validation protocol** from `mgechev/skills-best-practices` (Discovery → Logic → Edge Case → Architecture Refinement) using a fresh Claude chat per step. Prompts are paste-ready; ~10 minutes per skill.
1. **Install `skillgrade` on Sam's host** (`npm i -g skillgrade`). For each remaining skill, write a minimal `eval.yaml` with 23 tasks and run `skillgrade --smoke` (5 trials, ~5 min) to confirm the skill triggers when expected and produces correct output. **Likely outcome: some skills show 020% trigger rate — confirms they belong in AGENTS.md, not as skills.**
1. **Document the rules→recipes split as a BooCode convention** in `BOOCODER.md` / `BOOCHAT.md`. Future-proofs against re-adding workflow rules as skills.
**Migration risk:** non-trivial. `inference.ts` is ~1700 lines with custom XML fallback, SSE parsing, compaction integration. Plan dedicated cutover window. `compaction.ts` must update to assemble head from parts.
**Lift sources:**
**Replaces:** Original Batch 13 (append-only event log) — same outcome, different vocabulary.
- `blog.codeminer42.com/stop-putting-best-practices-in-skills/` — empirical 6%/33%/66%/100% invocation-rate data with Vercel-style multi-turn methodology. The activation-gap framing.
- `mgechev/skills-best-practices` (25 stars, MIT) — 4-step validation protocol with paste-ready prompts. Directory structure conventions.
- `mgechev/skillgrade` (132 stars, MIT) — agent-agnostic skill eval framework. `eval.yaml` task+grader schema. Smoke/reliable/regression presets.
- `platform.claude.com/docs/en/agents-and-tools/agent-skills/best-practices` — canonical Anthropic standard. 500-line ceiling, gerund naming, progressive disclosure patterns, MCP tool reference format, verification checklist.
**Today's debugging spike validates this work.** Four hours of confusion came from JSON-blob `tool_calls` / `tool_results` columns hiding state from logs and from the inference state machine being invisible. Typed parts + per-part status would have shown the slow-stream-vs-dead distinction in seconds.
**Dependencies:** none (the 7 v1.12 skills already exist; this is an audit pass on shipped material). Can ship at any point in the v1.13.x line.
**Dependencies:** v1.12.x cleanup merged.
**Estimated:** zero code changes, ~one evening of audit work, plus skillgrade install. Per-skill eval.yaml authoring is ~30 min per skill including the 4-step validation. Total roughly 56 hours of focused work for all 7 skills.
**Estimated:** ~1500 LoC.
### v1.13.2 — drop legacy columns (final phase of strangler-fig)
**Wait at least one week of production traffic on v1.13.1 before shipping.** The dual-write is rollback insurance. Drop the columns and that rollback is gone.
**Verification query before shipping:**
```sql
SELECT
COUNT(*) FILTER (WHERE m.tool_calls IS NOT NULL AND NOT EXISTS (
SELECT 1 FROM message_parts p WHERE p.message_id = m.id AND p.kind = 'tool_call'
)) AS missing_tool_call_parts,
COUNT(*) FILTER (WHERE m.tool_results IS NOT NULL AND NOT EXISTS (
SELECT 1 FROM message_parts p WHERE p.message_id = m.id AND p.kind = 'tool_result'
)) AS missing_tool_result_parts
FROM messages m
WHERE m.created_at > '2026-05-22'::timestamptz;
```
Both columns must read 0.
**Scope (~150 LoC, mostly deletions):**
1. Remove dual-write from every v1.13.0 site: `tool-phase.ts` (3 sites), `finalizeCompletion`, `skills.ts` (2 sites), `messages.ts` answer flow, `chats.ts` (fork). Keep only the parts write.
1. Simplify `messages_with_parts` view — drop COALESCE fallbacks since legacy columns are about to disappear.
1. `ALTER TABLE messages DROP COLUMN tool_calls, DROP COLUMN tool_results`.
1. Remove `tool_calls`/`tool_results` fields from `Message` API type. API boundary unchanged (frontend already reads parts-derived values).
1. Drop the stale `messages_status_check` cleanup DO block from v1.12.1 schema if still present.
1. Update test fixtures in `inference.test.ts` and `compaction.test.ts` to construct parts instead of inline `tool_calls: null, tool_results: null` literals. ~30 fixture rewrites.
After v1.13.2 ships, tag the umbrella `v1.13` on the same commit (or on -C — Sam's call).
-----
@@ -132,9 +210,17 @@ When a chat has a `streaming` row older than ~60s with no new tokens, the UI sho
**Scope:**
1. Outer loop continues until model returns non-tool finish OR step cap hit. Step ≠ tool call: one step can contain multiple tool calls in parallel.
2. `agent.steps ?? Infinity` per-agent step cap. AGENTS.md gains `steps:` field. Refactorer `steps: 5`, Architect `steps: 20`, etc.
3. Step-boundary events (`step_start`, `step_finish`) explicit in the parts stream. Per-step snapshot for revert (planned for BooCoder; backend-only in v1.14).
4. Doom-loop guards (v1.11.6) migrate from "abort recursion" to "raise within loop iteration." Same predicate, different control flow.
1. `agent.steps ?? Infinity` per-agent step cap. AGENTS.md gains `steps:` field. Refactorer `steps: 5`, Architect `steps: 20`, etc.
1. Step-boundary events (`step_start`, `step_finish`) explicit in the parts stream. Per-step snapshot for revert (planned for BooCoder; backend-only in v1.14).
1. Doom-loop guards (v1.11.6) migrate from "abort recursion" to "raise within loop iteration." Same predicate, different control flow.
**Lift sources:**
- `anomalyco/opencode` `session/prompt.ts` `runLoop()` outer agent loop
- `anomalyco/opencode` `agent.steps` per-agent step cap
- AGENTS.md extensions for `steps`, `output_schema` (Qodo agent.toml pattern), `exit_expression` (Qodo pattern), `execution_strategy` (Qodo plan/act)
- **Reference:** RA.Aid three-stage Research/Planning/Implementation as AGENTS.md design principle; expert-tool escape hatch pattern (most subtasks on routine model, escalate to qwopus27b only when needed)
- **Reference:** Roo Code Boomerang Tasks — orchestrator-with-capability-restriction pattern. Adopt as AGENTS.md design principle (orchestrator role can call only dispatch tools, no file reads / MCP / shell).
**Dependencies:** v1.13 merged.
@@ -142,28 +228,126 @@ When a chat has a `streaming` row older than ~60s with no new tokens, the UI sho
-----
## v1.15 — Phase D: permission ruleset + MCP client
## v1.14.x-mcp — single-server MCP-client proof-of-concept (NEW, 2026-05-22)
**Goal:** validate the MCP-client loop end-to-end against one real MCP server before committing to the full opencode `mcp/index.ts` port at v1.15. Small, throwaway-if-needed, slots between v1.14 and v1.15 without disrupting either.
**Scope:**
1. Add a hardcoded MCP client (single server) to BooChat. Initial target: **Context7** (Sam already uses it via opencode, so the config is known to work). Remote HTTP transport at `https://mcp.context7.com/mcp` with optional `CONTEXT7_API_KEY` header.
1. Use the official `@modelcontextprotocol/sdk` TypeScript client. No SSE transport yet (deferred to v1.15). Stdio transport not needed for Context7.
1. Tool discovery on startup: `tools/list`. Tools surface in BooChat alongside `view_file`/`grep`/etc., prefixed `context7_*` to avoid collisions.
1. **Read-only invariant guard:** the client must reject any MCP tool whose `annotations.readOnly` is false (or absent). Fail-closed. This is BooChat-specific defense-in-depth — v1.15 lifts this restriction for BooCoder.
1. Per-server `enabled` flag in `agents.ts`. No glob patterns yet.
1. **No OAuth.** Context7 supports an API key header; that's it for v1.14.x. OAuth lands in v1.15.
**What this proves:**
- MCP protocol loop works end-to-end against a real server in BooCode's Fastify backend.
- Tool-discovery → tool-list → tool-call → result-render → context-budget accounting all hold.
- Read-only enforcement at the client layer is sound.
- Config schema shape is right before v1.15 commits to the opencode-compatible JSON config.
**What this does NOT do:**
- No SSE transport. (v1.15.)
- No OAuth flow. (v1.15.)
- No multiple servers. (v1.15.)
- No per-agent server allow/deny. (v1.15.)
**Dependencies:** v1.13 merged (parts table for tool-call/tool-result emission).
**Estimated:** ~150 LoC.
**Skip-condition:** if v1.14 finishes and Sam wants to leap straight to v1.15, fold this into the early steps of v1.15.
-----
## v1.14.x-html — pane-based artifact viewer with Markdown + HTML (REVISED, 2026-05-23)
**Goal:** every assistant message gets an "Open in pane" affordance that renders it as an artifact — Markdown by default (the model's normal output), HTML only when the user explicitly asks for it (e.g. "render this as HTML", "make me a dashboard", "build an interactive diagram"). Both artifact types open in BooChat's existing workspace splitter. Markdown panes have **Copy** (raw source) + **Download** (`.md`); HTML panes have **Download** (`.html`) only. No inline iframe preview — artifacts are pane-only.
Inspired by Thariq Shihipar's "HTML > Markdown at length" pattern (`claude.com/blog/using-claude-code-the-unreasonable-effectiveness-of-html`, May 20 2026), but scoped down from that post's "auto-bias to HTML for >100 lines" recommendation: Markdown stays the default everywhere, HTML is an on-request rendering target for cases where interactive controls / diagrams / side-by-side layouts pay off.
**Scope:**
1. **Model-side prompting** (no code change, just AGENTS.md guidance):
- Add HTML-on-request rule to global `AGENTS.md`: "Stay in Markdown by default for all outputs, short or long. Switch to a self-contained `<!DOCTYPE html>...</html>` artifact only when the user explicitly asks (e.g. 'render this as HTML', 'make a dashboard', 'build a diagram')."
- Inline the `web-artifacts-builder` "avoid AI slop" design principles for when HTML is requested: no excessive centered layouts, no purple gradients, no uniform rounded corners, no Inter font, no generic AI aesthetics.
- Cite Thariq's blog post in the rule comment so future audit passes know where the design conventions came from.
1. **Detection at the BooChat backend.** In `apps/chat/services/inference/stream-phase.ts` post-processing: detect any assistant text part starting with `<!DOCTYPE html>` (case-insensitive, whitespace-trimmed) — or wrapped in a fenced ` ```html` block — and tag it as an HTML artifact. Emit a new part kind `html_artifact` into `message_parts` (CHECK constraint update). Payload: `{html_content, char_count, title}`. Title pulled from `<title>` tag or first `<h1>` if available. Detection is opportunistic — when the model produces HTML (because the user asked), the tag fires; otherwise the message stays plain-Markdown and no `html_artifact` part is written.
1. **Pane-only render surface.** Every assistant message in the chat stream gets an "Open in pane" affordance (icon button in the message footer, alongside the existing copy/regenerate controls). Clicking it opens the message as an artifact pane in BooChat's existing workspace splitter, alongside the file viewer and BooTerm. Pane is dismissible. Pane state persisted via `sessions.workspace_panes jsonb` (the v1.12.1 schema already supports this).
- **Markdown pane** — renders via the same Markdown component used inline in `MessageBubble` (so syntax highlighting, fenced code blocks, tables, etc. all work). Header carries **Copy** (writes raw Markdown source to clipboard via `navigator.clipboard.writeText`) and **Download** (`.md`) buttons.
- **HTML pane** — renders the artifact in a sandboxed iframe at full pane height. Header carries **Download** (`.html`) only. **No Copy button** — HTML source isn't useful clipboard content; if the user wants the source they can Download and inspect.
1. **Download path & filename slug.** Both formats write to `/opt/<project>/.boocode/artifacts/<slug>-<unix-timestamp>.<ext>` (path-guarded same as native write tools), and surface an OS download link via the existing file-serving path.
- Markdown slug: derived from the message's first heading (`# ...`) if present, else the first 6 words of the message body, lowercased + hyphenated.
- HTML slug: derived from the artifact's `<title>` tag if present, else first `<h1>`, else first 6 words of the inner text. Same lowercase-hyphen treatment.
1. **Security stance for HTML pane — locked 2026-05-22:** the iframe is sandboxed with `sandbox="allow-scripts allow-clipboard-write allow-downloads"`. **Crucially, omit `allow-same-origin`** so the artifact has its own opaque origin and cannot read BooChat's cookies, Authelia session, or DOM. Backend serves the iframe content via `srcdoc=...` inline (not `src=`) so no separate URL exists to disclose. CSP header on the iframe response: `default-src 'none'; script-src 'unsafe-inline'; style-src 'unsafe-inline'; img-src data: blob:; font-src data:; connect-src 'none'`. The `connect-src 'none'` is the key clause — artifacts can't `fetch()`, can't open WebSockets, can't ping a tracking pixel, can't exfiltrate. JS runs (so interactive knobs/sliders/copy-as-prompt buttons work) but nothing else network-touching does.
1. **Frontend components:**
- `apps/web/src/components/MarkdownArtifactPane.tsx` — pane shell + header (Copy + Download) + Markdown render reusing the existing component.
- `apps/web/src/components/HtmlArtifactPane.tsx` — pane shell + header (Download only) + `<iframe srcdoc={html_content} sandbox="allow-scripts allow-clipboard-write allow-downloads" />`.
- `MessageBubble.tsx` — add "Open in pane" affordance to every assistant message footer. Dispatches workspace-pane action `{type: 'markdown_artifact' | 'html_artifact', message_id, html_content?}`. When the message has an `html_artifact` part, the affordance opens as an HTML pane; otherwise it opens as a Markdown pane.
- Download button → POST to new endpoint `/api/chats/:id/messages/:msg_id/artifacts/download?fmt=md|html` which writes to disk (path-guarded) and returns the absolute path or pre-signed URL for the existing static-file serving route.
1. **No artifact persistence beyond the chat.** Artifacts live in `message_parts.payload->>'html_content'` (for HTML) or are derived on-demand from the assistant message's content (for Markdown). Downloads go to `/opt/<project>/.boocode/artifacts/` and are user-managed from there. No separate artifacts table.
1. **Token-budget guard.** Single HTML artifact can be at most 1MB of HTML in `message_parts.payload`. Larger triggers a streaming abort with a friendly error: "Artifact exceeded 1MB; consider splitting into multiple files or reducing inline assets." Markdown artifacts have no separate cap — they're bounded by the existing message-size envelope.
1. **No `web-artifacts-builder` skill vendor.** That skill (`anthropics/skills/web-artifacts-builder`) is built for Claude.ai's runtime with Vite + Parcel + tspaths + html-inline toolchain. BooChat has no shell execution surface. The pattern transplants; the toolchain doesn't. Treat the skill's "avoid AI slop" design principles as conventions inlined in the HTML-on-request AGENTS.md rule. The init/bundle scripts are out of scope.
**Lift sources:**
- `claude.com/blog/using-claude-code-the-unreasonable-effectiveness-of-html` (Thariq Shihipar, May 20 2026) — design conventions and use-case taxonomy (specs/code-review/design/reports/custom editors). The "auto-bias for >100 lines" recommendation is deliberately NOT lifted.
- HTML iframe sandbox spec (web platform standard, no license issues).
- `anthropics/skills/web-artifacts-builder` — design-principle reference only ("avoid AI slop" rules). **Do not vendor the toolchain.**
**Dependencies:** v1.13 merged (`message_parts` table is where HTML artifacts live). Independent of v1.14 (outer loop) and v1.14.x-mcp (MCP PoC). Can ship in any order relative to those.
**Estimated:** ~400 LoC. Roughly half backend (HTML detection + part-kind extension + download endpoint + path-guard integration + Markdown slug derivation) and half frontend (two artifact-pane components + MessageBubble affordance + pane integration + download wiring).
**Schema addition:**
- `message_parts.kind` CHECK constraint adds `'html_artifact'` to the allowed set.
**Skip-condition:** none — independent batch, ships clean any time after v1.13. Pane-based artifact view is a structural UX improvement (full-height read surface for long replies, durable download path) on top of the HTML-on-request rendering capability.
**Shipped as `v1.13.19-html-artifact-panes` on 2026-05-23.** Two scope-revisions during impl: (a) the HTML-on-request rule landed in `BOOCHAT.md` (always-true rules layer), not `data/AGENTS.md` (per-agent registry) — per BOOCHAT.md's own convention block. (b) Pane state stayed reference-only — `{chat_id, message_id, title}` — content fetched on mount via the existing chat-messages endpoint (Markdown) and a new `GET /api/chats/:id/messages/:msg_id/html_artifact` (HTML). Storing content in pane state would have ridden 1MB blobs through the `session_workspace_updated` WS frame and bloated the jsonb column on multi-pane sessions. Defense-in-depth additions beyond the original proposal: `X-Content-Type-Options: nosniff` + `Content-Security-Policy: sandbox` on the GET serve route, and `assertArtifactsDirSafe` realpaths the artifacts dir after `mkdir` to close a symlink-escape gap that would otherwise let a planted symlink under `.boocode/artifacts/` route writes outside the project root. Smoke not run pre-tag; first deploy is the smoke.
-----
## v1.15 — Phase D: permission ruleset + full MCP client
**Goal:** wildcard permission ruleset (opencode `evaluate.ts` pattern) and a proper MCP client implementation. Foundation for BooCoder to gate writes; immediate value for codecontext to be re-wired as a real MCP server.
**Scope:**
1. Wildcard rule matcher: `{ permission, pattern, action: 'allow' | 'deny' | 'ask' }`. Last-match-wins. Per-agent rulesets layer under per-session rulesets.
2. MCP client implementation: SSE transport, `tools/list` discovery, `tools/call` invocation. codecontext sidecar gets re-pointed from static wrappers (v1.12) to real MCP. New connectors become a config-only addition.
3. UI: permission-ask flow when a tool requires `ask` action. Modal or inline card with Allow once / Allow always / Deny.
4. v1.x stays read-only by default (no `write` tools in the registry yet).
1. **Full MCP client implementation:** stdio (local subprocess) + SSE (remote HTTP) transports, `tools/list` discovery, `tools/call` invocation, OAuth via Dynamic Client Registration (RFC 7591), per-server enabled flag, **glob patterns for per-agent tool whitelisting** (matching opencode's `tools` config shape).
1. codecontext sidecar gets re-pointed from static wrappers (v1.12) to real MCP. New connectors become a config-only addition.
1. UI: permission-ask flow when a tool requires `ask` action. Modal or inline card with Allow once / Allow always / Deny. Reuses v1.9.7 elicitation surface.
1. BooChat stays read-only by default — the read-only invariant guard from v1.14.x carries forward (defense-in-depth even with the ruleset).
1. **Config shape: match opencode's JSON schema near-verbatim** so any opencode user can copy `mcp` blocks from `~/.opencode/config.json` into BooCode unchanged. Schema is not copyrightable; matching it is pure interoperability.
**v1 MCP scope limit (security):** local-stdio MCP servers and Context7-style API-key remote servers only. **Remote MCP servers requiring OAuth tokens are deferred** until BooCode has a real secret-storage primitive (sops-encrypted entries, Vault sidecar, or OS keyring). Reason: MCP OAuth tokens are bearer credentials for third-party services; storing them in plaintext PostgreSQL inside the BooCode DB widens the attack surface significantly if Authelia is bypassed. v1.15 ships the OAuth code path but the config schema rejects OAuth servers until secret storage lands.
**Absorbs:** Original Batch 12 (tool approval + plan/act mode) — same outcome via permission rules instead of mode enum.
**Lift sources:**
- `anomalyco/opencode` `permission/evaluate.ts` wildcard ruleset
- `anomalyco/opencode` `mcp/index.ts` MCP client (SSE transport, tools/list, tools/call, OAuth RFC 7591)
- `cline/cline` plan/act invariant — read-only mode pattern (absorbed)
**Dependencies:** v1.13 merged (parts table for permission events). Independent of v1.14.
**Estimated:** ~600 LoC.
-----
## v1.16 — Batch 11b: codesight repo_health
## v1.16 — codesight repo_health
Call graph, circular dependency detection, dead code flagging. Port `analyze.mjs` from spirituslab/codesight. New tool `repo_health(project_id)`. In-process Node (not sidecar). Cache results keyed by `(project_id, file_hashes_sig)`.
Call graph, circular dependency detection, dead code flagging. Port `analyze.mjs` from `spirituslab/codesight`. New tool `repo_health(project_id)`. In-process Node (not sidecar). Cache results keyed by `(project_id, file_hashes_sig)` in new `repo_health_cache` table.
Independent batch — ships clean any time after v1.13. Low leverage unless Sam actually uses the dead-code / circular-dep output.
**Lift source:** `spirituslab/codesight` `analyze.mjs`. Drop VS Code wrapper.
**Dependencies:** v1.12 merged (can reuse codecontext parse output where overlapping).
@@ -171,23 +355,72 @@ Call graph, circular dependency detection, dead code flagging. Port `analyze.mjs
-----
## v2.0 — BooCoder pending changes
## v2.0 — BooCoder: pending changes + dual execution paths + ACP host + MCP server
New container `boocoder` at `100.114.205.53:9502`. Owns write tools (`edit_file`, `create_file`, `delete_file`, `apply_pending`, `rewind`). Edits queue in `pending_changes` table; nothing touches disk until `/apply`. Per-pane diff UI with Approve/Reject. BooCode chat stays read-only (`/opt:/opt:ro`).
**Major version bump.** New app `apps/coder/` inside the existing monorepo (not a separate repo). Lands together with the `boocode_db``boochat_db` DB rename and the per-app subdomain split (`code.indifferentketchup.com` → BooChat, `coder.indifferentketchup.com` → BooCoder).
**Lift source:** plandex pending-changes data model.
**Three protocol roles in one surface:**
**Dependencies:** v1.13 (parts) + v1.15 (permissions).
1. **MCP client (write-capable allowed).** Inherits the v1.15 client unchanged. BooCoder can enable write-capable MCP servers (`@modelcontextprotocol/server-filesystem` write tools, git commit MCP servers, etc.). All MCP writes route through the same `pending_changes` queue as native writes. Per-task allow/deny means dispatched tasks can have a different MCP roster than the interactive shell.
1. **MCP server (BooCoder's own primitives).** New `apps/coder/services/mcp_server.ts` exposes `boocoder.create_task`, `boocoder.list_pending_changes`, `boocoder.apply`, `boocoder.reject`, `boocoder.dispatch_external_agent`, `boocoder.list_worktrees` as MCP tools. Stdio transport for local consumers (Sam's `opencode` in Termius), HTTP for remote (deferred until OAuth + secret storage). **This is what makes external opencode-on-the-host BooCoder-aware.**
1. **ACP client (host).** Replaces the raw-PTY dispatch path for ACP-capable agents. Spawns `opencode acp` and `goose acp` as JSON-RPC stdio subprocesses. Native session lifecycle, mid-session model/mode switching, file-operation events surfaced as diffs in the BooCoder UI, terminal events that route into BooTerm, permission prompts answered via real dialogs. **MCP servers configured in BooCoder are auto-forwarded to the dispatched ACP agent** (per goose docs — `context_servers` is the field name). One MCP config drives every dispatched agent.
**Estimated:** ~1200 LoC.
**Two execution paths, same surface (the answer to the May 18 "1 and 2 full featured" question):**
### Path A — in-process write-tool inference loop (Option B / native)
- New write tools: `edit_file`, `create_file`, `delete_file`, `apply_pending`, `rewind`.
- Edits queue in `pending_changes (id, session_id, file_path, diff TEXT, status, created_at)`. Nothing touches disk until `/apply`.
- Per-pane diff UI with Approve/Reject.
- Path-guard layer (`apps/coder/services/path_guard.ts`) enforces per-project scoping using the v1.15 permission wildcard ruleset. Blanket `/opt:rw` mount, policy at the tool layer. **Highest-priority test target: fuzz the path-guard against every traversal-attack pattern, including MCP-served filesystem writes.**
**Lift source:** `plandex-ai/plandex` pending-changes data model and diff/apply/rewind UX vocabulary.
### Path B — ACP/PTY dispatch to external CLI agents (Option A / dispatch)
- New tool `dispatch_external_agent(agent: 'opencode'|'claude'|'goose'|'pi', model: string, task: string, worktree: string)`.
- **Primary path: ACP subprocess** for agents that support it (opencode `opencode acp`, goose `goose acp`). JSON-RPC over stdio. Native session/tool/file/terminal events.
- **Fallback path: raw PTY** for claude/pi/smallcode via `node-pty` with `cwd = /opt/<project>` or a `git worktree add /tmp/booworktrees/<session-id>` worktree per dispatch.
- Dispatch worker checks `available_agents.supports_acp` at runtime and picks the right transport. Same task table, same project registry, same pending-changes flow.
- Captures stdout/stderr/exit-code into PostgreSQL stream tables (PTY path) or maps ACP events to the parts taxonomy (ACP path). WebSocket events surface to all three React surfaces.
- One worktree per active dispatched session.
- User picks per task via UI dropdown at task creation, or the in-process loop calls `dispatch_external_agent` itself.
**Lift sources:**
- `Dominic789654/agent-hub` (Apache-2.0) — task DAG schema, dispatcher worker, project registry, human inbox. **Primary architectural template.**
- `getpaseo/paseo` (AGPL-3.0, **design only — no code lift**) — daemon+clients architecture, `--worktree feature-x` flag, `paseo run/ls/attach/send` CLI verb shape, `/handoff` `/loop` `/orchestrator` skills concept.
- Roo Code Boomerang Tasks pattern — orchestrator capability restriction + down-pass/up-pass context discipline (`new_task` message, `attempt_completion` result, no implicit inheritance) + explicit precedence override clause.
- `covibes/zeroshot` blind-validation invariant — verify gate runs in separate agent context that only sees the diff and acceptance criteria, not the producing conversation.
- **ACP spec** (`agentclientprotocol.com`) — local-subprocess ACP via stdio JSON-RPC. Remote ACP (HTTP/WS) is still work-in-progress per the spec maintainers; v2.0 uses stdio only.
- **Goose ACP docs** (`goose-docs.ai/docs/guides/acp-clients/`) — `context_servers` auto-forward pattern. Critical: one MCP config drives every dispatched agent.
### Shared infrastructure between A and B
- `tasks` table (id, project_id, template_id, parent_task_id, state, input, output_summary, dependencies, agent, model, worktree_path, cost, started_at, ended_at)
- `task_templates` table (reusable spec → task instantiations)
- `pipelines` table + `pipeline_runs` (ordered template invocations)
- `available_agents` table (name, install_path, version, supports_acp, supports_mcp_client, last_probed_at) — populated by startup probe (`which opencode && opencode --version`, etc.)
- `human_inbox` view (state IN ('blocked', 'failed', 'needs_human'))
- Worker process `boocoder-dispatcher` (systemd unit alongside Fastify): picks ready tasks, dispatches via A or B (and within B, ACP or PTY), captures output, marks state.
- New `boocode` CLI as a thin WebSocket/HTTP client against the BooCoder API. Verbs: `boocode run`, `boocode ls`, `boocode attach <id>`, `boocode send <id>`. Mirrors Paseo's UX, license-clean implementation.
- BooCoder-internal MCP server (see role 2 above) registered on the Fastify server alongside the existing HTTP/WS endpoints. Stdio transport for opencode-in-Termius; HTTP transport gated on OAuth + secret storage.
**MCP server eval requirement:** run BooCoder's internal MCP server through the **anthropics `mcp-builder` skill's 10-question evaluation framework** before shipping. Ten independent, read-only, complex questions with verifiable answers in XML format. If the eval doesn't pass, the MCP server isn't shippable.
**Dependencies:** v1.13 (parts table) + v1.14 (outer loop + step boundaries for revert snapshots) + v1.14.x (MCP-client PoC) + v1.15 (full MCP client + permissions for path-guard policy).
**Estimated:** ~1500 LoC for Path A + Path B + shared schema, plus ~400 LoC for the MCP-server role, plus ~300 LoC for the ACP-client role. Multiple sub-versions: v2.0.0 native + ACP, v2.0.1 MCP server, v2.0.2 polish.
-----
## v2.1 — BooCoder runtime isolation
## v2.1 — BooCoder runtime isolation (optional)
Per-session Docker sandbox spawned by BooCoder on first write. Only project path mounted, not `/opt`. Idle-timeout 30 min. Standard OpenHands runtime contract: HTTP API inside container, BooCoder calls in.
**Lift source:** OpenHands V1 runtime pattern.
**Skip-condition:** if the v2.0 path-guard layer holds up under fuzzing + a few months of production use, runtime isolation becomes optional hardening rather than necessary defense. Track but don't commit.
**Lift source:** `OpenHands/OpenHands` V1 runtime pattern.
**Dependencies:** v2.0.
@@ -195,24 +428,64 @@ Per-session Docker sandbox spawned by BooCoder on first write. Only project path
-----
## v2.2 — BooCoder as ACP agent (driveable from external editors)
**Goal:** expose `boocoder acp` so Zed, JetBrains, Avante.nvim, CodeCompanion.nvim can drive BooCoder as their agent. Outbound exposure of the BooCoder write-tool surface to ACP-compatible editors.
**Scope:**
1. New ACP server entry point: `boocoder acp` reads JSON-RPC over stdio, exposes BooCoder's task primitives as ACP sessions.
1. BooCoder UI features remain optional: editor drives session via ACP; pending-changes queue still gates writes; user can approve/reject from either BooCoder's web UI or the editor's permission dialog (whichever responds first).
1. Same auth model as the rest of BooCoder — editor must be reachable on the Tailscale mesh, or BooCoder is invoked with a short-lived token.
**Why this is v2.2, not v2.0:** outbound ACP-agent role is cheap once the inbound ACP-client side is implemented (same protocol library, server side), but it's a *different product surface* — driving BooCoder from external editors. Ship it after BooCoder's own surface stabilizes.
**Lift source:** `zed-industries/codex-acp` (Apache-2.0) as a server-side ACP reference implementation.
**Dependencies:** v2.0 + v2.1 (recommended; ACP-driven sessions inside a sandbox are stronger).
**Estimated:** ~400 LoC.
-----
## v2.x — Optional / far future
- **Verify gate above pending-changes** — `augmentcode/augment-swebench-agent` majority-vote ensembler pattern (K candidate diffs → ranker model picks winner). JSONL schema only, no code lift. Combine with zeroshot blind-validation invariant. v2.0+ optional batch.
- **PR-resolver tool** — `qodo-ai/qodo-skills` PR-resolver state machine (fetch issues → batch/interactive fix → inline reply). BooCoder v2.0+.
- **Record/replay LLM harness for tests** — `qodo-ai/qodo-cover` pattern (hashed prompt → fixture YAML). Re-implement in Vitest, don't vendor (AGPL). v1.13+ test infrastructure.
- **HMAC-chained audit log** — `sipyourdrink-ltd/bernstein` pattern. Small lift, adds tamper-evident session history. v1.13+ optional.
- **Tiered tool loading** — `eyaltoledano/claude-task-master` pattern (env var: `core` / `standard` / `all`). ~30 LoC in `agents.ts`. Pattern-only lift (claude-task-master is MIT + Commons Clause; reimplement). v1.13.x or v1.14.
- **Spec directory structure** — `Fission-AI/OpenSpec` `openspec/changes/<name>/{proposal,specs,design,tasks}.md` shape for BooCode's own batch docs. Zero-dep documentation reformat, replaces ad-hoc `boocode_batchN.md` convention. v1.13.x or v1.14.
- **`view_session_history` MCP tool** — `memovai/memov` `snap`/`mem_history`/`validate_commit` shape. Reference design for v1.13+ session-history feature.
- **`taste-skill` anti-slop ban list** — vendor `Leonxlnx/taste-skill` SKILL.md after diff against existing `frontend-design` skill. Real value at v2.0+ when BooCoder generates frontend code (DubDrive, BooLab, Fathom).
- **AgentLint audit pass** — manual review of BooCode's own CLAUDE.md/AGENTS.md/BOOCHAT.md/BOOCODER.md using `0xmariowu/AgentLint`'s 31 evidence-backed checks. Trim emphasis-keyword density, hit 60120 line sweet spot, SHA-pin Actions, ensure `.env`/`CLAUDE.local.md` are gitignored. One-evening pass, immediate ROI. Optional plugin install at v1.12.x post-merge for ongoing audits.
- **`budi` install (Sam's host)** — `siropkin/budi` Claude Code 5-hook observer (`SessionStart`/`UserPromptSubmit`/`PostToolUse`/`SubagentStart`/`Stop`). Local SQLite, sub-ms hook latency, dashboard at `localhost:7878`. Not a BooCode lift — install globally for Claude Code session observability.
- **Multi-provider LLM** (pi-ai pattern): Only if a concrete need for Anthropic / OpenAI / Mistral direct surfaces. llama-swap covers everything today.
- **Workflow graphs** (microsoft/agent-framework concepts): Multi-agent coordination. Conceptual reference only. Realistically a v3.x topic.
- **Secret storage primitive (prerequisite for remote OAuth MCP servers).** Pick between: sops-encrypted entries in PostgreSQL, HashiCorp Vault sidecar, or OS-level keyring on `ubuntu-homelab` accessed via a thin service. Unblocks remote OAuth MCP servers in BooCode generally. v2.x or earlier if a remote OAuth server (Sentry, Atlassian, etc.) becomes urgent.
-----
## Architecture target state
### Containers
### Containers (post-v2.0)
| Container | Port | Mount | Purpose | Status |
|---|---|---|---|---|
| `boocode` | `100.114.205.53:9500` | `/opt:/opt` | Chat + read-only tools + SPA | Live |
| `boocode_db` | `127.0.0.1:5500` | `boocode_pgdata` volume | Postgres 16-alpine | Live |
| `booterm` | `100.114.205.53:9501` | `/opt/repos:/opt/repos:rw` | Terminals (tmux + node-pty) | Live (v1.10.0) |
| **`codecontext`** | **`:8765` (internal)** | **`/opt/projects:/workspace:ro`** | **MCP server for architect tools** | **Live (v1.12.0)** |
| `boocoder` | `100.114.205.53:9502` | per-session sandbox | Write tools | v2.0 |
|Container |Port |Mount |Purpose |Status |
|-------------------------------|---------------------|-----------------------------|------------------------------------------------------------------------|----------------------|
|`boochat` (was `boocode`) |`100.114.205.53:9500`|`/opt:/opt:ro` |Read-only chat + SPA host + MCP client |Live (renames at v2.0)|
|`booterm` |`100.114.205.53:9501`|`/opt:/opt` |PTY/tmux terminal sessions |**Live (May 2026)** |
|`boocoder` |`100.114.205.53:9502`|`/opt:/opt:rw` (policy-gated)|Write tools + ACP host + MCP client + MCP server + external-CLI dispatch|v2.0 |
|`boochat_db` (was `boocode_db`)|`127.0.0.1:5500` |`boocode_pgdata` volume |Postgres 16-alpine (shared by all three) |Live (renames at v2.0)|
|`codecontext` |`:8765` (internal) |`/opt/projects:/workspace:ro`|MCP server for architect tools |**Live (v1.12.0)** |
### Caddy routing target (post-v2.0)
```
code.indifferentketchup.com → boochat :9500 (SPA + chat API + MCP client)
coder.indifferentketchup.com → boocoder :9502 (SPA + write API + MCP client + MCP server HTTP)
coder.indifferentketchup.com/mcp → boocoder :9502 (BooCoder MCP server endpoint, when remote-MCP unlocked)
term.indifferentketchup.com → booterm :9501 (or routed under code.*/term/)
```
### Schema additions by version
@@ -220,52 +493,178 @@ Per-session Docker sandbox spawned by BooCoder on first write. Only project path
- **v1.11.7:** none (pathGuard logic, no DB)
- **v1.12.0:** none (codecontext stateless; truncation in-memory id-map with TTL cleanup)
- **v1.12.1:** `sessions.workspace_panes jsonb` (workspace sync); drop deprecated `session_panes` table; drop stale `messages_status_check` constraint
- **v1.13:** `message_parts` table; `messages` becomes header-only
- **v1.13.0-ai-sdk-v6:** `message_parts (id, message_id, sequence, kind, payload jsonb, created_at)` + unique `(message_id, sequence)` + `kind` CHECK; `messages_with_parts` view with COALESCE fallbacks; `ToolDef.category` field (TS type, not DB)
- **v1.13.1-cleanup-bundle:** `ALTER DATABASE boocode SET statement_timeout = '30s'` (op step, documented in schema.sql; doesn't survive volume reset)
- **v1.13.2-compaction-prune:** `message_parts.hidden_at TIMESTAMPTZ` column + partial index `(message_id) WHERE hidden_at IS NULL`; `messages_with_parts` view filters hidden parts
- **v1.13.3-truncate:** none (tmpfs id-map stored on disk under `BOOCODE_TRUNCATION_DIR`; no schema)
- **v1.13.4-reasoning-fix:** none (compaction read-side change; `CompactionMessage` extended in TS, not DB)
- **v1.13.5-stability-bundle:** none (provider config + 4 frontend/payload guards + budget constant, no schema change)
- **v1.13.6-prefix-stability:** none — verify-and-measure batch, instrumentation only; drops the originally-planned `system_prompt_cache` table since recon proved input-layer mtime caches already achieve prefix stability
- **v1.13.7-compaction-trigger:** none (compaction overflow trigger is a constant change in `services/compaction.ts`, no DB)
- **v1.13.8-tool-cost:** `tool_cost_stats` SQL view over `messages_with_parts` (no new table — view + LATERAL `jsonb_array_elements` on `tool_calls`); rolling 100-call window
- **v1.13.9-agentlint:** none (instruction-file audit + `.gitignore` add of `CLAUDE.local.md`, no DB)
- **v1.13.10-openspec:** none (docs reorganization, `git mv` only)
- **v1.13.11-tools:** none (env-var tier filter at request time, no DB)
- **v1.13.12-ws-schemas:** none (Zod schemas + wrappers in TS, no DB)
- **v1.13.13-ws-publish:** none (publish-site conversion + protocol-drift fix in `compaction.ts`, no DB)
- **v1.13.14-skills-audit:** none (skills + AGENTS.md migration into git via `.gitignore` negation patterns; no DB)
- **v1.13.15-codecontext-synth (this batch, tag pending):** `message_parts.kind` CHECK constraint extended with `'synthesis'` value (DROP + DO $$ pg_constraint idempotency-guarded re-add)
- **(column drop, pending — old working name v1.13.2):** drop `messages.tool_calls`, `messages.tool_results`; simplify `messages_with_parts` view
- **v1.14:** `agents.steps` column (or AGENTS.md parser extension; no DB if file-only)
- **v1.15:** `permissions` table, `agent_permissions` join, `session_permissions` join
- **v1.14.x-mcp (NEW):** none — single-server MCP-client PoC is config-only at first, no schema change
- **v1.14.x-html (NEW):** `message_parts.kind` CHECK constraint extended with `'html_artifact'` value
- **v1.15:** `permissions` table, `agent_permissions` join, `session_permissions` join, `mcp_servers (name, type, transport, url_or_command, enabled, config_hash, last_probed_at)` registry
- **v1.16:** `repo_health_cache (project_id, file_hashes_sig, payload JSONB, created_at)`
- **v2.0:** `pending_changes (id, session_id, file_path, diff TEXT, status, created_at)`
- **v2.0:** `pending_changes (id, session_id, file_path, diff TEXT, status, created_at)`; `tasks`, `task_templates`, `pipelines`, `pipeline_runs`; `available_agents (name, install_path, version, supports_acp, supports_mcp_client, last_probed_at)`; `human_inbox` view; DB rename `boocode_db``boochat_db`
- **v2.2:** none (`boocoder acp` is a new entry point, not a schema change)
-----
## Lift sources (summary)
## Lift sources (headline table)
Full inventory in `boocode_code_review.md`. Headline items:
Full inventory and rationale in `boocode_code_review.md`. Headline items below; `anomalyco/opencode` is canonical (not `sst/opencode` — correction 2026-05-22).
| Source | Used for | Where |
|---|---|---|
| `sst/opencode` (MIT, TS) | Compaction algorithms | v1.11.0 (shipped) |
| `sst/opencode` (MIT, TS) | Doom-loop guard | v1.11.6 (shipped) |
| `sst/opencode` (MIT, TS) | `repairToolCall`, truncate.ts, MCP client, permission evaluate, runLoop | v1.12 (shipped) / v1.13 / v1.14 / v1.15 |
| `continuedev/continue` (Apache-2.0) | `DEFAULT_SECURITY_IGNORE_FILETYPES` | v1.11.7 (shipped) |
| `nmakod/codecontext` (MIT, Go) | Architect: codebase map sidecar | v1.12.0 (shipped) |
| `spirituslab/codesight` (MIT-ish, TS) | Architect: repo health analyzer | v1.16 |
| `Aider-AI/aider` (Apache-2.0) | Fallback `.scm` grammars | v1.12 (fallback) |
| `cline/cline` (Apache-2.0) | Plan/Act pattern (absorbed into v1.15 permissions) | v1.15 |
| `plandex-ai/plandex` (MIT) | Pending-changes data model | v2.0 |
| `OpenHands/OpenHands` (MIT) | Sandbox runtime contract | v2.1 |
| `aimasteracc/tree-sitter-analyzer` (MIT) | Outline-first patterns | v1.12 (alt) |
| `earendil-works/pi` (MIT) | Multi-provider LLM | v2.x (optional) |
|Source |License |Used for |Where |
|--------------------------------------------------------------------------------|----------------------------------------|--------------------------------------------------------------------------------------------------------------------------|----------------------------------------------|
|`anomalyco/opencode` |MIT, TS |Compaction algorithms (`session/compaction.ts` + `session/overflow.ts`) |v1.11.0 ✅ |
|`anomalyco/opencode` |MIT, TS |Doom-loop guard (`session/processor.ts` `DOOM_LOOP_THRESHOLD=3`) |v1.11.6 ✅ |
|`continuedev/continue` |Apache-2.0 |`DEFAULT_SECURITY_IGNORE_FILETYPES` |v1.11.7 ✅ |
|`nmakod/codecontext` |MIT, Go |Architect: codebase map sidecar (8 MCP-shaped tools, static-wrapped) |v1.12.0 ✅ |
|`anomalyco/opencode` |MIT, TS |AI SDK v6 adoption + `streamText` swap + ReasoningPart shape |v1.13.1 ✅ |
|`anomalyco/opencode` |MIT, TS |Parts-message taxonomy (text/tool_call/tool_result/reasoning/step_start) |v1.13.0 ✅ |
|`anomalyco/opencode` |MIT, TS |`experimental_repairToolCall` via AI SDK v6 |v1.13.3 ✅ |
|`anomalyco/opencode` |MIT, TS |Two-tier compaction prune (`message_parts.hidden_at` + tier logic) |v1.13.4 ✅ |
|`anomalyco/opencode` |MIT, TS |`tool/truncate.ts` truncation + outputPath pattern (adapted: opaque id) |v1.13.5 ✅ |
|`anomalyco/opencode` |MIT, TS |0.85×ctx_max overflow trigger formula |v1.13.9 (planned) |
|`anomalyco/opencode` |MIT, TS |`session/prompt.ts` `runLoop()` outer agent loop + `agent.steps` cap |v1.14 |
|**Anthropic MCP SDK (TypeScript)** |**MIT** |**MCP client, single-server PoC** |**v1.14.x-mcp** |
|**`claude.com/blog/using-claude-code-the-unreasonable-effectiveness-of-html`** |**(blog, pattern only)** |**HTML-output bias rule + use-case taxonomy** |**v1.14.x-html** |
|**`anthropics/skills/web-artifacts-builder`** |**MIT (design-principle reference)** |**"Avoid AI slop" conventions inline in AGENTS.md** |**v1.14.x-html** |
|**`mgechev/skills-best-practices`** |**MIT (pattern)** |**4-step skill validation protocol with paste-ready prompts** |**v1.13.12 (skills audit)** |
|**`mgechev/skillgrade`** |**MIT** |**Agent-agnostic skill eval framework (eval.yaml + smoke/reliable/regression presets)** |**v1.13.12 (skills audit) + ongoing** |
|**`blog.codeminer42.com/stop-putting-best-practices-in-skills/`** |**(blog, pattern only)** |**Rules→recipes split: skills 6% invoke vs AGENTS.md 100% present** |**v1.13.12 (skills audit)** |
|**`platform.claude.com/docs/.../agent-skills/best-practices`** |**(docs, canonical)** |**500-line ceiling, gerund naming, progressive-disclosure patterns, MCP `ServerName:tool_name` format** |**v1.13.12 + all future skills** |
|`anomalyco/opencode` |MIT, TS |`permission/evaluate.ts` wildcard ruleset |v1.15 |
|`anomalyco/opencode` |MIT, TS |`mcp/index.ts` MCP client (stdio + SSE, tools/list, tools/call, OAuth RFC 7591) |v1.15 |
|`Aider-AI/aider` |Apache-2.0 |Fallback `aider/queries/tree-sitter-*.scm` grammars |v1.12 (fallback) |
|`cline/cline` |Apache-2.0 |Plan/Act invariant (absorbed into v1.15 permissions) |v1.15 |
|`spirituslab/codesight` |MIT-ish |Repo health analyzer (`analyze.mjs`) |v1.16 |
|`plandex-ai/plandex` |MIT |Pending-changes data model + diff/apply/rewind UX |v2.0 |
|`Dominic789654/agent-hub` |Apache-2.0 |**Task DAG schema, dispatcher worker, project registry, human inbox** — primary architectural template for v2.0 dispatcher|v2.0 |
|`getpaseo/paseo` |AGPL-3.0 (**design only, no code lift**)|Daemon+clients arch, CLI verb shape, worktree flag, three skills concept |v2.0 / v2.x |
|**`agentclientprotocol.com` spec + `@zed-industries/agent-client-protocol` SDK**|**Apache-2.0** |**ACP client (host) — replaces raw-PTY dispatch for opencode/goose** |**v2.0** |
|**anthropics/skills `mcp-builder`** |**MIT** |**MCP server build workflow + 10-question evaluation framework** |**v2.0 (BooCoder MCP server)** |
|**`zed-industries/codex-acp`** |**Apache-2.0** |**ACP server-side reference for `boocoder acp`** |**v2.2** |
|Roo Code: Boomerang Tasks |Apache-2.0 (pattern only) |Orchestrator capability restriction + down-pass/up-pass context discipline |v1.14 (AGENTS.md) → v2.0 (real delegation) |
|`covibes/zeroshot` |MIT (pattern only) |Blind-validation invariant + complexity-classification conductor |v1.14 (AGENTS.md) → v2.0 (verify gate) |
|`OpenHands/OpenHands` |MIT |Sandbox runtime contract |v2.1 |
|`qodo-ai/agents` |MIT |`agent.toml` schema (output_schema, exit_expression, execution_strategy) |v1.14 |
|`qodo-ai/qodo-cover` |AGPL-3.0 (re-implement, don't vendor) |Record/replay LLM response harness |v1.13+ tests |
|`qodo-ai/qodo-skills` |MIT |PR-resolver state machine + provider-CLI adapter pattern |v2.0+ |
|`augmentcode/augment-swebench-agent` |MIT |Majority-vote ensembler (K diffs → ranker → winner) + JSONL schema |v2.0+ optional |
|`eyaltoledano/claude-task-master` |MIT+Commons Clause (pattern only) |Tiered tool loading via env var + three model roles |v1.13.x / v1.14 |
|`Fission-AI/OpenSpec` |permissive (verify) |`openspec/changes/<name>/{proposal,specs,design,tasks}.md` structure for batch docs |v1.13.x / v1.14 |
|`0xmariowu/AgentLint` |MIT |31 evidence-backed checks for CLAUDE.md/AGENTS.md quality |Immediate manual pass; v1.12.x optional plugin|
|`Leonxlnx/taste-skill` |MIT |Anti-slop ban list + 3-dial parameterization pattern |v2.0+ (BooCoder frontend output) |
|`RA.Aid` (ai-christianson) |Apache-2.0 (pattern only) |Three-stage Research/Planning/Implementation + expert-tool escape hatch |v1.14 (AGENTS.md) |
|`memovai/memov` |MIT (pattern only) |`.mem` shadow timeline + `snap`/`validate_commit` MCP tool shape |v1.13+ history tool design; v2.0+ drift gate |
|`sipyourdrink-ltd/bernstein` |(verify) |HMAC-chained audit log primitive |v1.13+ optional |
|`aimasteracc/tree-sitter-analyzer` |MIT |Outline-first patterns (`trace_impact` tool) |v1.12 (alt) / unscheduled |
|`earendil-works/pi` |MIT |Multi-provider LLM (`pi-ai`) |v2.x (optional) |
|`siropkin/budi` (tooling, not lift) |MIT |Claude Code 5-hook observer for Sam's host workflow |Immediate (install globally) |
|**`aaif-goose/goose`** |**Apache-2.0** |**ACP agent (`goose acp`) — dispatched alongside opencode in v2.0 Path B** |**v2.0 (host install)** |
-----
## Decisions log
- **v1.13.7 stability bundle (2026-05-22, uncommitted).** Five-fix sweep during the cosmetic-revert investigation surfaced two production-affecting regressions latent since v1.13.1-A. (1) **`@ai-sdk/openai-compatible` `includeUsage` defaults to false** — `provider.ts` never asked llama-swap to emit usage, so `tokens_used`/`ctx_used` had been NULL in every assistant row since v1.13.1-A. The fix is one line at `provider.ts:18`. No backfill for historical rows. (2) **AI SDK v6 streaming emits a stray `\n` text-delta on tool-call-only turns**, which passed `content.length > 0` and rendered an empty bubble + ActionRow between each tool call. Trim in `MessageList.flatten` (`hasText`) and defensively in `MessageBubble` (`hasContent`). (3) **`buildMessagesPayload` did not filter trailing empty or failed assistant rows** — combined with (2), a Continue retry produced `…summary-assistant, empty-assistant, failed-assistant` payloads and the upstream rejected with "Cannot have 2 or more assistant messages at the end of the list." Skip rules added at `payload.ts:64`. (4) **`BUDGET_NO_AGENT` bumped 15→30.** Every tool in `ALL_TOOLS` is read-only today; the cautious 15-cap was forward-looking for write tools that haven't landed. No-agent mode now matches `BUDGET_READ_ONLY`. None of the five changes touch schema or compaction — they're cleanup against a "v1.13.1-A regression that hadn't been caught yet" surface.
- **Skills taxonomy locked: AGENTS.md = rules, skills = recipes (2026-05-22).** Codeminer42's multi-turn eval showed plain skills invoke 6% in clean runs vs `CLAUDE.md`/`AGENTS.md` 100% present. **General workflow rules (TDD, paraphrase-before-quote, security gotchas, "never git pull/commit/push", alpha-tool-ordering, codecontext-not-RAG) belong in `AGENTS.md`; specific on-demand procedures (`/skill scaffold-component`, `/skill run-release-checklist`) belong in skills.** Hooks are for automation, not instruction delivery. The 7 vendored v1.12 skills get an audit pass in **v1.13.12** to sort each into the 4-way split (move to AGENTS.md / keep as recipe / move bulky context to `references/` / delete). Validation via `mgechev/skills-best-practices` 4-step protocol + `mgechev/skillgrade --smoke` per skill. Anthropic's `agent-skills/best-practices` page becomes the canonical convention reference (500-line ceiling, gerund naming, MCP `ServerName:tool_name` format, progressive disclosure one level deep, etc.). Documented in `BOOCHAT.md` / `BOOCODER.md` to future-proof against re-adding workflow rules as skills.
- **HTML artifacts in BooChat locked (2026-05-22).** Adopt Thariq Shihipar's "HTML > Markdown for outputs >100 lines" pattern. AGENTS.md gets the HTML-bias rule. Backend detection emits new `html_artifact` part kind. Frontend renders in three places: inline iframe preview in chat stream, "open in pane" workspace splitter integration, and download to `/opt/<project>/.boocode/artifacts/<slug>-<timestamp>.html`. Security: `sandbox="allow-scripts allow-clipboard-write allow-downloads"` with no `allow-same-origin`, CSP `connect-src 'none'`, `srcdoc=` inline (not `src=`). All of Thariq's interactive examples (sliders/knobs/SVG diagrams/copy-as-JSON) work under this sandbox because they're entirely client-side. Don't vendor `anthropics/skills/web-artifacts-builder` — its Vite + Parcel toolchain can't run in BooChat (no shell). Treat the skill's "avoid AI slop" rules as design conventions inlined in AGENTS.md.
### MCP and ACP protocol roles per surface (2026-05-22, locked)
- **BooChat = MCP client only.** Read-only tool consumer. Per-server `enabled` flag. **Hard rule: never enable a write-capable MCP server** — the read-only invariant overrides protocol convenience. Defense-in-depth: client must reject any tool whose `annotations.readOnly` is false or absent.
- **BooCoder = MCP client + MCP server + ACP client (host) + ACP agent (driveable).** Full matrix.
- **MCP client role:** inherits v1.15 client; write-capable servers allowed but writes route through `pending_changes` queue.
- **MCP server role:** BooCoder exposes its own task primitives (`boocoder.create_task` etc.) so external `opencode` sessions in Termius become BooCoder-aware. Stdio for local, HTTP gated on OAuth+secret storage.
- **ACP client (host) role:** replaces raw-PTY dispatch for ACP-capable agents (opencode, goose). PTY retained as fallback for claude/pi/smallcode. Critical pattern: ACP clients auto-forward MCP `context_servers` to the dispatched agent (per goose docs) — one MCP config drives every dispatched agent.
- **ACP agent role:** `boocoder acp` exposes BooCoder to Zed/JetBrains/Avante.nvim. Deferred to v2.2.
- **Why BooChat doesn't get ACP:** ACP standardizes the editor→agent direction. BooChat doesn't drive agents; it *is* the chat. Adding ACP-agent to BooChat would convert it into an opencode-equivalent — different product. Skip.
- **MCP/ACP integration phasing:** v1.14.x (single-server MCP-client PoC against Context7) → v1.15 (full MCP client + permissions) → v2.0 (BooCoder full matrix: write-capable MCP client + MCP server + ACP client) → v2.2 (BooCoder ACP agent for external editor drive).
- **Reference materials:** anthropics `mcp-builder` skill (4-phase build workflow + 10-question eval framework — required for BooCoder's MCP server before shipping), opencode MCP/ACP docs as JSON-schema interop reference, goose ACP docs for the `context_servers` auto-forward pattern, `agentclientprotocol.com` spec (note: remote ACP via HTTP/WS still WIP, v2.0 uses stdio only).
- **v1 MCP scope limit (security):** local-stdio MCP servers + Context7-style API-key remote only. Remote OAuth MCP servers (Sentry, Atlassian, etc.) deferred until BooCode has a real secret-storage primitive — token leakage from a PostgreSQL dump or Authelia bypass is a real attack surface that doesn't exist with local-stdio MCP.
### Monorepo / multi-app structure (2026-05-22, locked)
- **BooCode is a 3-app monorepo** at `/opt/boocode/`: `apps/chat` (read-only, currently the live thing at 9500), `apps/coder` (write tools + external CLI dispatch, 9502, v2.0 planned), `apps/booterm` (PTY terminal, **live since May 2026 at 9501**). Shared `apps/server` (Fastify backend) and `apps/web` (React shell hosting the three surfaces as tabs).
- **Single shared database, rename `boocode_db``boochat_db` when BooCoder lands.** All three surfaces in one Postgres. Cross-surface joins are valuable (coder task → originating chat → term debugging session). Separate databases would break this.
- **Mount strategy: blanket `/opt:rw`, policy enforcement at the write-tool layer.** Per-project scoping is logic, not mount. Path-guard correctness becomes the highest-priority test target for v2.0 — fuzz it, property-test it, every traversal-attack pattern (including MCP-served filesystem writes).
- **External CLI agents on the host, not in containers.** BooCoder shells out via local-exec PTY or ACP subprocess (`node-pty`, host shell, or `child_process.spawn('opencode', ['acp'])`). Host install inherits Sam's existing `~/.opencode/`, `~/.claude/`, `~/.config/goose/` configs without re-mounting. Containerize later only if a concrete reason emerges.
### Strategic pivot: Paseo-equivalent dispatcher (2026-05-22)
Sam wants BooCode to function like Paseo without using Paseo itself. **Paseo is AGPL-3.0** — incompatible with BooCode's MIT license and its network-served deployment at `code.indifferentketchup.com`. Solution: **reproduce the architecture in BooCode's existing Fastify + TS + PostgreSQL + React stack, using only license-clean patterns**.
- **Primary architectural template:** `Dominic789654/agent-hub` (Apache-2.0) — three-process model (board server + dispatcher + assistant terminal) and schema (tasks/projects/templates/pipelines/human_inbox).
- **Critical context-management primitive:** Roo Code Boomerang Tasks pattern — orchestrator with intentional capability restriction, down-pass/up-pass context discipline, no implicit inheritance.
- **Observation pattern:** Claude Code hooks (siropkin/budi reference) — register BooCode as the hook receiver for `SessionStart`/`UserPromptSubmit`/`PostToolUse`/`SubagentStart`/`Stop`.
- **Protocol-level Paseo equivalence:** the ACP client + MCP server combination in BooCoder is the protocol-spelled version of Paseo's daemon. ACP gives multi-agent dispatch with structured events instead of free-form PTY output. MCP server gives BooCoder-as-task-board, callable from any MCP client (Termius-based opencode, future editors). One MCP config feeds every dispatched agent (via `context_servers` auto-forward).
This is now the dominant roadmap direction, **ahead of v1.13.x cleanup batches in importance** but **behind them in sequence** (v1.13 finishing now; Paseo-equivalent work is v2.0+).
### BooCoder execution: both Option A AND Option B, full-featured (2026-05-22)
Earlier May 18 chat recommended Option A (thin orchestration shell over OpenCode) but explicitly called the choice not-locked. Sam's call this session: ship **both** paths in the same BooCoder surface. **Option B / in-process loop** handles interactive write work with native tools + pending-changes UI (v2.0 plandex pattern). **Option A / PTY-or-ACP dispatch** handles parallel/batch work where Sam wants to A/B opencode vs claude vs goose vs pi against the same task in separate worktrees. User picks per task. **ACP replaces raw PTY wherever the agent supports it** (opencode, goose); PTY fallback retained for claude/pi/smallcode.
### v1.13.x cleanup line locked (2026-05-22)
After the 2026-05-22 retag, the v1.13.x cleanup line in `vMAJOR.MINOR.PATCH-slug` form is **v1.13.0-ai-sdk-v6 ✅ → v1.13.1-cleanup-bundle ✅ → v1.13.2-compaction-prune ✅ → v1.13.3-truncate ✅ → v1.13.4-reasoning-fix ✅ → v1.13.5-stability-bundle ✅ → v1.13.6-prefix-stability ✅ → v1.13.7-compaction-trigger ✅ → v1.13.8-tool-cost ✅ → v1.13.9-agentlint ✅ → v1.13.10-openspec ✅ → v1.13.11-tools ✅ → v1.13.12-ws-schemas ✅ → v1.13.13-ws-publish ✅ → v1.13.14-skills-audit ✅ → v1.13.15-codecontext-synth ✅ → v1.13.16-xml-parser ✅ → column drop (final, pending — old working name v1.13.2)**. **Do not fold.** Smoke isolation matters: each batch has a distinct rollback surface, and bisecting a 750-LoC merge across four unrelated changes is worse than four separate dispatches.
### v1.13 retrospective (what shipped)
- **v1.13.0** — `message_parts` table + dual-write at every JSON-write site. Old columns authoritative for reads. Reversible.
- **v1.13.1-A** — AI SDK v6 (`ai@^6`, `@ai-sdk/openai-compatible@^2`). `streamCompletion` rewritten as `streamText` adapter. Silent-abort bug caught and patched (explicit `if (signal?.aborted) throw`). Known regression: mid-stream tps gone — TODO for delta-cadence interpolation against `result.usage`. **Latent regression discovered v1.13.7:** `includeUsage` defaults false on `@ai-sdk/openai-compatible`, so `result.usage` resolved empty all along; tokens_used/ctx_used NULL in every row since this version. Fixed in v1.13.7.
- **v1.13.1-B** — `messages_with_parts` view with COALESCE fallbacks. Read sites switched. 1ms for 42-message chat verified.
- **v1.13.1-C** — `ask_user_input` correlation ported to parts; reasoning end-to-end (361 chars reasoning at seq 0, 429 chars text at seq 1 in smoke). `v1.13.1` tagged on `ac1a71f`. **Latent regression discovered v1.13.6:** reasoning was wired into the inference payload but NOT into compaction's head-assembly payload — summarizer model couldn't see reasoning for tool-bearing turns, degrading qwen3.6 summary quality. Fixed in v1.13.6.
- **v1.13.3** — bundle: statement_timeout=30s, alpha tool ordering, periodic stuck-row sweeper, repairToolCall wiring. Tagged on `a08d809`.
- **v1.13.4** — two-tier compaction prune. Tagged on `ec8593c`.
- **v1.13.5** — opencode truncate.ts port + view_truncated_output tool. Tagged on `f8fc5db`.
- **v1.13.6** — compaction head-assembly audit + reasoning fix. Closed the Q3 reasoning gap from v1.13.1-C. Tagged on `81d837c`.
- **v1.13.7** — stability bundle: includeUsage fix + trim guards + payload filter + budget bump. Surfaces tokens (closes a v1.13.1-A latent regression where `result.usage` resolved empty), kills the empty-bubble + ActionRow noise between tool calls on single-tool-call turns, and unblocks Continue after cap-hit on chats that have trailing empty/failed assistants.
- **v1.13.2 deferred** — at least one week of production traffic on v1.13.1 before dropping legacy columns. Dual-write is rollback insurance.
### Pre-v1.13 architectural decisions (still load-bearing)
- **Embeddings dropped from BooCode** (May 2026). Replaced RAG with file-view tools + sidecar analyzers.
- **opencode promoted to Tier A** (2026-05-20). Five algorithms identified for lift (compaction, doom-loop, repairToolCall, runLoop, permission evaluate) plus truncate.ts and MCP client.
- **OpenCode canonical repo: `anomalyco/opencode`, NOT `sst/opencode`** (correction 2026-05-22). Development moved to anomalyco; sst/opencode is the predecessor lineage. All 15 catalog references rewritten.
- **Original Batch 11 (aider PageRank port) replaced** by codecontext sidecar approach.
- **Original Batch 12 (codebase indexer w/ Harrier) removed.** No embedding infrastructure in BooCode v1.x.
- **Original Batch 12 (codebase indexer w/ Harrier) removed.** No embedding infrastructure.
- **Original Batch 13 (OpenHands event log) replaced** by v1.13 parts table (opencode pattern).
- **Original Batch 12 (cline plan/act mode) absorbed into v1.15** (opencode permission ruleset).
- **Aider's `repomap.py` port dropped.** Codecontext supersedes it. Aider contribution narrows to the `.scm` query files only.
- **Globstar parked** — not an architect tool. Future verify-before-commit candidate only.
- **codeprysm rejected** — embedding-based. Node/edge taxonomy noted as reference if we ever build our own graph.
- **Batch 9 decoupled from Batch 7 (2026-05-16); shipped in `92bd3b1`.** Builtin defaults: six agents (Code Reviewer, Debugger, Refactorer, Architect, Security Auditor, Prompt Builder) with no `model` field. Session model wins by default.
- **opencode lift opened** (2026-05-20). Started with compaction (v1.11.0). Continuing through v1.15. Five distinct algorithms: compaction, doom-loop guard, repairToolCall, runLoop, permission evaluate. Plus `truncate.ts` and MCP client. Each lifts the algorithm, not the Effect-TS plumbing.
- **AI SDK adoption deferred to v1.13.** Hand-roll repairToolCall in v1.12 — not actually done in v1.12.0; truncation also deferred. v1.12.0 shipped codecontext + container guidance + skills only.
- **AI SDK adoption deferred to v1.13** — and shipped as v1.13.1-A. v6 chosen (not v5) for native typed parts model and top-level `experimental_repairToolCall`.
- **`tool_choice='required'` confirmed supported** by llama-swap (qwen3.6-35b-a3b-mxfp4, 2026-05-20).
- **v1.11.4 cancelled** (2026-05-20). Per-turn budget reset + Continue affordance + CapHitSentinel were already shipped in v1.8.2.
- **v1.12.0 shipped** (2026-05-21). codecontext sidecar Track B + container guidance Track A. v1.12 truncation and repairToolCall were deferred into v1.13's AI SDK migration where they get for-free.
- **v1.12.0 shipped 2026-05-21.** codecontext sidecar Track B + container guidance Track A. v1.12 truncation and repairToolCall deferred into v1.13.
- **v1.12.1 workspace pane sync** (2026-05-21). Moved pane state from per-device localStorage to `sessions.workspace_panes jsonb` with WS broadcast for cross-device sync. Deprecated `session_panes` table dropped. Legacy localStorage migrates on first load.
- **v1.12.1 status indicator overhaul** (2026-05-21). ChatStatusFrame expanded from `working|idle|error` to `streaming|tool_running|waiting_for_input|idle|error`. StatusDot rewritten with distinct animations per state. Added `executeToolPhase`-entry `tool_running` publish.
- **detectSameNameLoop reverted** (planned v1.12.1). Added during the 2026-05-21 debugging spike to catch same-tool-name-with-different-args loops. Never fired in any real run because the existing `detectDoomLoop` covers the actual failure modes. Dead code, reverting.
- **The 2026-05-21 "freeze" debugging spike taught one lesson**: BooCode has no UI signal for the difference between a slow stream and a dead stream. Diagnostic logging (added today, reverted in v1.12.1) revealed the inference loop was working correctly throughout — what looked like four hours of deterministic hang was multiple instances of qwen3.6 generating 8k tokens of self-doubt at temperature 0.2 on a "find the bug" prompt with no real bug. v1.12.2 (live tok/s display) and v1.12.3 (stale-stream banner) directly address this gap.
- **v1.12.1 status indicator overhaul** (2026-05-21). ChatStatusFrame expanded from `working|idle|error` to `streaming|tool_running|waiting_for_input|idle|error`. StatusDot rewritten with distinct animations per state.
- **detectSameNameLoop reverted in v1.12.1.** Added during the 2026-05-21 debugging spike, never fired in any real run. Dead code.
- **The 2026-05-21 "freeze" debugging spike taught one lesson**: BooCode had no UI signal for the difference between a slow stream and a dead stream. v1.12.2 (live tok/s) and v1.12.3 (stale-stream banner) directly closed that gap. **v1.13's typed parts table made the inference state machine visible by construction** — the structural fix the spike pointed to.
- **v1.12.4 refactor shipped 2026-05-21/22.** `inference.ts` (1700 LoC) split into `inference/` directory before v1.13 so the AI SDK migration had clean seams. `stream-phase.ts` became the swap target for `streamText`, `tool-phase.ts` got the per-tool `category` tag (added in v1.13.0). Pure structural move, no behavior change.
- **AI SDK v6 silent-abort patched (v1.13.1-A).** `fullStream` returns normally on abort instead of throwing. Without explicit `if (signal?.aborted) throw` after the stream drain, stop button writes `complete` instead of `cancelled`. One-liner comment at the site so it survives future refactors.
### Catalog growth (2026-05-22 deep review pass)
The session-of-the-day catalog review added 50+ new entries to `boocode_code_review.md`. Decisions worth carrying into roadmap planning:
- **Tier A active lifts unchanged:** opencode, codecontext, tree-sitter-analyzer, codesight, aider.
- **Tier B / Tier C reviewed and triaged.** Most consequential additions: agent-hub (#48, primary v2.0 architectural template), Roo Boomerang Tasks (#46, v1.14 AGENTS.md pattern), zeroshot (#37, blind-validation invariant), AgentLint (#39, immediate manual audit pass), RA.Aid (#44, three-stage routing), OpenSpec (#36, batch-doc structure), bernstein (#49, HMAC audit log), memov (#42, session-history tool design), siropkin/budi (#51, install for Claude Code observability).
- **Rejected as code sources:** kilocode, costrict, prompt-tower, mycoder, reviewcerberus (closed Docker), Junie (closed), Cody (parked), VS Code extensions broadly, all Web Builders, LynxPrompt (GPL-3.0), claude-task-master code (Commons Clause), Paseo source (AGPL).
- **No additional code lifts promoted to a current version.** All catalog adds are either patterns (license-clean), references (for v2.0+), or one-off audit-pass items (AgentLint, budi install).
-----
@@ -274,13 +673,13 @@ Full inventory in `boocode_code_review.md`. Headline items:
Each batch:
1. Verify previous batch merged. `git log --oneline main -5`.
2. Cut branch from main. Single-branch-per-dispatch convention.
3. Dispatch via Paseo to Claude Code at `/opt/boocode`.
4. Claude Code recon → blocking questions → implement → hand back.
5. Compliance review in separate Claude chat (paste handback).
6. Build: `docker compose build --no-cache boocode` (no-cache avoids the v1.11.2 stale-bundle trap).
7. Restart: `docker compose up -d boocode`.
8. Smoke test in browser (hard refresh).
9. Sam commits and pushes. **Never** `git pull` / `git push` / `git commit` on his behalf.
1. Cut branch from main. Single-branch-per-dispatch convention.
1. Dispatch via Paseo to Claude Code at `/opt/boocode`.
1. Claude Code recon → blocking questions → implement → hand back.
1. Compliance review in separate Claude chat (paste handback).
1. Build: `docker compose build --no-cache <surface>` where surface is `boocode` (chat) / `booterm` / `boocoder` (v2.0+). No-cache avoids the v1.11.2 stale-bundle trap.
1. Restart: `docker compose up -d <surface>`.
1. Smoke test in browser (hard refresh).
1. Sam commits and pushes. **Never** `git pull` / `git push` / `git commit` on his behalf.
Sam reviews all diffs.
Sam reviews all diffs. Backups before any destructive step: `cp file file.bak-$(date +%Y%m%d-%H%M%S)`.

208
data/AGENTS.md Normal file
View File

@@ -0,0 +1,208 @@
# Agents
## Code Reviewer
---
temperature: 0.3
tools: [find_files, get_codebase_overview, get_dependencies, get_file_analysis, get_framework_analysis, get_semantic_neighborhoods, get_symbol_info, grep, list_dir, search_symbols, view_file, watch_changes]
description: Reviews code for bugs, security issues, and maintainability. Read-only.
---
You review code. Find real problems, not style nits.
Process:
1. Read the file(s) in question with view_file. If a diff is provided, read surrounding context too.
2. Use grep/find_files to check how changed symbols are used elsewhere.
3. Cite every finding as file:line.
Prioritize in order:
1. Bugs and logic errors
2. Security issues (injection, auth bypass, secret leakage, unsafe deserialization, SSRF, path traversal)
3. Race conditions, error handling, resource leaks
4. Performance issues with measurable impact
5. Maintainability (only if it blocks future work)
Skip: formatting, naming preferences, "consider extracting", "add a comment here". The user has a linter.
Output format:
- Critical: <file:line> — <issue> — <fix>
- Major: <file:line> — <issue> — <fix>
- Minor: <file:line> — <issue> — <fix>
If nothing critical or major, say so in one line. Do not pad.
Codecontext usage:
- Use get_codebase_overview to orient yourself before reviewing changes.
- Use search_symbols to find callers of modified functions.
- Use get_dependencies to trace impact of changes.
## Debugger
---
temperature: 0.4
tools: [find_files, get_codebase_overview, get_dependencies, get_file_analysis, get_framework_analysis, get_semantic_neighborhoods, get_symbol_info, grep, list_dir, search_symbols, view_file, watch_changes]
description: Diagnoses bugs from error messages, logs, or described symptoms.
---
You diagnose bugs. Form a hypothesis, prove it with evidence from the code.
Process:
1. Restate the symptom in one line.
2. Locate the symbol or frame named in the symptom. Read its definition.
3. Find callers and related state.
4. State the root cause with file:line evidence. Propose the minimal fix.
Rules:
- Never guess. If evidence is missing, say what you need (specific log line, specific file, specific repro step).
- Distinguish symptom from cause. A null check fixes the symptom; missing init causes it.
- Off-by-one, race conditions, and silent except blocks are common — check for them.
- If two plausible causes exist, name both and say what would discriminate.
## Refactorer
---
temperature: 0.3
tools: [find_files, get_codebase_overview, get_dependencies, get_file_analysis, get_framework_analysis, get_semantic_neighborhoods, get_symbol_info, grep, list_dir, search_symbols, view_file, watch_changes]
description: Proposes refactors for clarity, deduplication, or decoupling. Read-only — outputs plans, not edits.
---
You propose refactors. You do not apply them. The user applies via OpenCode or Claude Code.
Process:
1. Read the target file(s).
2. grep for callers, duplicates, and similar patterns elsewhere in the repo.
3. Identify the smallest refactor that delivers the goal.
Prioritize:
1. Deduplication where 3+ sites have near-identical logic
2. Extracting a function/module when one is doing two unrelated jobs
3. Decoupling when a change in A forces a change in B unnecessarily
4. Renaming when a name actively misleads
Reject:
- Refactors that touch 10+ files for marginal gain
- "Modernization" with no concrete benefit
- Abstraction for future flexibility that may never come
- Style-only changes
Output:
- Goal: <one line>
- Scope: <files affected, count of lines roughly>
- Plan: numbered steps, each one self-contained
- Risk: <what tests must pass, what could regress>
- Skip if: <conditions under which this refactor is not worth doing>
Codecontext usage:
- Use get_dependencies to map call sites before refactoring.
- Use get_symbol_info to understand each affected symbol.
- Refactoring without dependency awareness is reckless.
## Architect
---
temperature: 0.5
tools: [find_files, get_codebase_overview, get_dependencies, get_file_analysis, get_framework_analysis, get_semantic_neighborhoods, get_symbol_info, grep, list_dir, search_symbols, view_file, watch_changes]
description: Designs new features, modules, or architectural changes. Outputs a build plan.
---
You design. You produce build plans, not code.
Process:
1. Restate the goal in your own words. Confirm constraints (perf, deploy, deps).
2. list_dir the relevant areas. Read existing patterns — match them unless there's a reason not to.
3. Decide: extend existing code or add new module. Justify.
4. Sketch the data flow: inputs → transforms → outputs → side effects.
5. Identify integration points: DB schema, API surface, env vars, container boundaries.
6. List failure modes and how the design handles them.
Rules:
- Reuse before inventing. If a service/lib in the repo already does this, say so.
- Prefer boring tech. New deps require justification.
- Tailscale IPs for internal routing. No 0.0.0.0 binds.
- Least privilege: separate read/write paths, explicit auth gates.
- State assumptions inline. Do not ask clarifying questions mid-design unless blocked.
Output:
- Goal
- Existing code to reuse: <file paths>
- New code: <file paths, one-line purpose each>
- Data model changes: <SQL or schema diff>
- API surface: <endpoints, request/response shapes>
- Failure modes: <list>
- Build order: numbered, each step 30-90 min
Codecontext usage:
- Use get_codebase_overview for new-codebase orientation.
- Use get_framework_analysis to understand the stack.
- Use get_semantic_neighborhoods to find related components.
## Security Auditor
---
temperature: 0.2
tools: [find_files, get_codebase_overview, get_dependencies, get_file_analysis, get_framework_analysis, get_semantic_neighborhoods, get_symbol_info, grep, list_dir, search_symbols, view_file, watch_changes]
description: Audits code for security vulnerabilities. Read-only.
---
You audit for security issues. Concrete findings only, no generic warnings.
Process:
1. Identify the trust boundary: where does untrusted input enter? Where does it leave?
2. Trace input flow with grep. Mark every transformation.
3. Check each finding against a real attack scenario.
Look for:
- Injection: SQL (raw queries, string concat into queries), command (subprocess with shell=True, unescaped args), XSS (unescaped output in HTML/JSX), template injection, NoSQL injection
- AuthN/AuthZ: missing checks on routes, IDOR (user-supplied IDs without ownership check), JWT misuse (alg=none, weak secret, no expiry), session fixation
- Secrets: hardcoded keys/passwords, .env in repo, secrets in logs, secrets in error messages
- Crypto: weak hashes (MD5, SHA1 for passwords), missing salt, predictable randomness (Math.random for tokens), ECB mode, custom crypto
- Network: SSRF (user URL → server fetch), open CORS, missing CSRF on state-changing requests, plaintext over public network
- File: path traversal, unrestricted upload type/size, zip slip
- Deserialization: pickle, yaml.load, eval, exec on user input
- Resource: missing rate limits on auth/expensive endpoints, unbounded query results
For each finding:
- Severity: Critical / High / Medium / Low
- Location: file:line
- Attack scenario: one sentence describing how an attacker exploits this
- Fix: minimal change
Skip:
- Generic "use HTTPS" advice
- "Consider adding rate limiting" without a specific endpoint
- CVE-of-the-week scares without proof the code is affected
If the code is clean, say so. Do not invent findings.
Codecontext usage:
- Use search_symbols with terms like 'auth', 'token', 'password', 'crypto' to find security-sensitive code.
- Use get_dependencies direction=incoming on auth functions to find all callers.
## Prompt Builder
---
temperature: 0.4
tools: [view_file, list_dir, grep, find_files]
description: Builds prompts for OpenCode, Claude Code, or BooCode dispatch.
---
You write prompts that another coding agent will execute. Your output is the prompt, not the work.
Process:
1. Ask the user (or read context) for: goal, target repo, target files if known, constraints.
2. list_dir and view_file the target area. Confirm files exist and are roughly the shape you think.
3. Identify imports, exports, and conventions in the repo (component layout, error handling style, test framework).
4. Write the prompt.
Prompt structure:
- One-line goal at the top
- Constraints block: don't commit, don't push, don't pull. Use `#careful` and `#nofluff` style hashtags if the target agent honors them
- Pre-flight: list_dir or grep commands the agent must run before writing (e.g. "run: ls frontend/src/components/ui/ and only import primitives that exist")
- Files to modify: explicit paths
- Files to create: explicit paths with one-line purpose
- Behavior spec: numbered, testable
- Backup rule: `cp file file.bak-$(date +%Y%m%d)` before any destructive edit
- Verification: `py_compile`, `tsc --noEmit`, `docker compose up --build -d` — whichever applies
- Stop conditions: when to halt and report instead of pressing on
Rules:
- Tailored to the target agent: OpenCode honors hashtag snippets and skills; Claude Code honors CLAUDE.md and slash commands; BooCode batches are written as user-facing markdown
- Never include credentials or secrets
- Never instruct the agent to commit or push
- Include the exact model the user wants if dispatch is via Paseo or BooCode batch
- For BooLab frontend prompts, always include the "verify shadcn primitives exist" preflight
Output: the prompt, ready to paste. Nothing else.

View File

@@ -0,0 +1,3 @@
# Attribution
Skills vendored from https://github.com/anthropics/knowledge-work-plugins
License: see LICENSE

View File

@@ -0,0 +1,212 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Syntax-file, code seperations,code vault integeted with css definition.
Pre-recordec, pre-tested, elements to capture elements into Packeted-User-Relations to capture - [
pre-requisites, statements, recorded-cams
, cams-data
, data, input()
]

View File

@@ -0,0 +1,116 @@
---
name: reviewing-code
description: Review code changes for security, performance, and correctness. Trigger with a PR URL or diff, "review this before I merge", "is this code safe?", or when checking a change for N+1 queries, injection risks, missing edge cases, or error handling gaps.
argument-hint: "<PR URL, diff, or file path>"
---
# /reviewing-code
Review code changes with a structured lens on security, performance, correctness, and maintainability.
## Usage
```
/reviewing-code <PR URL or file path>
```
Review the provided code changes: @$1
If no specific file or URL is provided, ask what to review.
## How It Works
```
┌─────────────────────────────────────────────────────────────────┐
│ CODE REVIEW │
├─────────────────────────────────────────────────────────────────┤
│ STANDALONE (always works) │
│ ✓ Paste a diff, PR URL, or point to files │
│ ✓ Security audit (OWASP top 10, injection, auth) │
│ ✓ Performance review (N+1, memory leaks, complexity) │
│ ✓ Correctness (edge cases, error handling, race conditions) │
│ ✓ Style (naming, structure, readability) │
│ ✓ Actionable suggestions with code examples │
├─────────────────────────────────────────────────────────────────┤
│ SUPERCHARGED (when you connect your tools) │
│ + Source control: Pull PR diff automatically │
│ + Project tracker: Link findings to tickets │
│ + Knowledge base: Check against team coding standards │
└─────────────────────────────────────────────────────────────────┘
```
## Review Dimensions
### Security
- SQL injection, XSS, CSRF
- Authentication and authorization flaws
- Secrets or credentials in code
- Insecure deserialization
- Path traversal
- SSRF
### Performance
- N+1 queries
- Unnecessary memory allocations
- Algorithmic complexity (O(n²) in hot paths)
- Missing database indexes
- Unbounded queries or loops
- Resource leaks
### Correctness
- Edge cases (empty input, null, overflow)
- Race conditions and concurrency issues
- Error handling and propagation
- Off-by-one errors
- Type safety
### Maintainability
- Naming clarity
- Single responsibility
- Duplication
- Test coverage
- Documentation for non-obvious logic
## Output
```markdown
## Code Review: [PR title or file]
### Summary
[1-2 sentence overview of the changes and overall quality]
### Critical Issues
| # | File | Line | Issue | Severity |
|---|------|------|-------|----------|
| 1 | [file] | [line] | [description] | 🔴 Critical |
### Suggestions
| # | File | Line | Suggestion | Category |
|---|------|------|------------|----------|
| 1 | [file] | [line] | [description] | Performance |
### What Looks Good
- [Positive observations]
### Verdict
[Approve / Request Changes / Needs Discussion]
```
## If Connectors Available
If **~~source control** is connected:
- Pull the PR diff automatically from the URL
- Check CI status and test results
If **~~project tracker** is connected:
- Link findings to related tickets
- Verify the PR addresses the stated requirements
If **~~knowledge base** is connected:
- Check changes against team coding standards and style guides
## Tips
1. **Provide context** — "This is a hot path" or "This handles PII" helps me focus.
2. **Specify concerns** — "Focus on security" narrows the review.
3. **Include tests** — I'll check test coverage and quality too.

View File

@@ -0,0 +1,14 @@
skill: reviewing-code
tasks:
- prompt: "Review this PR before I merge: https://github.com/example/repo/pull/42"
grader:
- the response invokes the reviewing-code skill
- the response checks for security, performance, and correctness issues
- the response cites findings as file:line
- prompt: "Is this code safe? `db.query('SELECT * FROM users WHERE id = ' + userId)`"
grader:
- the response invokes the reviewing-code skill
- the response identifies SQL injection
- prompt: "What's a good book to read this weekend?"
grader:
- the response does NOT invoke the reviewing-code skill

View File

@@ -0,0 +1,3 @@
# Attribution
Skills vendored from https://github.com/anthropics/skills
License: see LICENSE (mixed: Apache-2.0 + source-available — check upstream per-skill)

View File

@@ -0,0 +1 @@
404: Not Found

View File

@@ -0,0 +1,42 @@
---
name: designing-frontends
description: Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications (examples include websites, landing pages, dashboards, React components, HTML/CSS layouts, or when styling/beautifying any web UI). Generates creative, polished code and UI design that avoids generic AI aesthetics.
license: Complete terms in LICENSE.txt
---
This skill guides creation of distinctive, production-grade frontend interfaces that avoid generic "AI slop" aesthetics. Implement real working code with exceptional attention to aesthetic details and creative choices.
The user provides frontend requirements: a component, page, application, or interface to build. They may include context about the purpose, audience, or technical constraints.
## Design Thinking
Before coding, understand the context and commit to a BOLD aesthetic direction:
- **Purpose**: What problem does this interface solve? Who uses it?
- **Tone**: Pick an extreme: brutally minimal, maximalist chaos, retro-futuristic, organic/natural, luxury/refined, playful/toy-like, editorial/magazine, brutalist/raw, art deco/geometric, soft/pastel, industrial/utilitarian, etc. There are so many flavors to choose from. Use these for inspiration but design one that is true to the aesthetic direction.
- **Constraints**: Technical requirements (framework, performance, accessibility).
- **Differentiation**: What makes this UNFORGETTABLE? What's the one thing someone will remember?
**CRITICAL**: Choose a clear conceptual direction and execute it with precision. Bold maximalism and refined minimalism both work - the key is intentionality, not intensity.
Then implement working code (HTML/CSS/JS, React, Vue, etc.) that is:
- Production-grade and functional
- Visually striking and memorable
- Cohesive with a clear aesthetic point-of-view
- Meticulously refined in every detail
## Frontend Aesthetics Guidelines
Focus on:
- **Typography**: Choose fonts that are beautiful, unique, and interesting. Avoid generic fonts like Arial and Inter; opt instead for distinctive choices that elevate the frontend's aesthetics; unexpected, characterful font choices. Pair a distinctive display font with a refined body font.
- **Color & Theme**: Commit to a cohesive aesthetic. Use CSS variables for consistency. Dominant colors with sharp accents outperform timid, evenly-distributed palettes.
- **Motion**: Use animations for effects and micro-interactions. Prioritize CSS-only solutions for HTML. Use Motion library for React when available. Focus on high-impact moments: one well-orchestrated page load with staggered reveals (animation-delay) creates more delight than scattered micro-interactions. Use scroll-triggering and hover states that surprise.
- **Spatial Composition**: Unexpected layouts. Asymmetry. Overlap. Diagonal flow. Grid-breaking elements. Generous negative space OR controlled density.
- **Backgrounds & Visual Details**: Create atmosphere and depth rather than defaulting to solid colors. Add contextual effects and textures that match the overall aesthetic. Apply creative forms like gradient meshes, noise textures, geometric patterns, layered transparencies, dramatic shadows, decorative borders, custom cursors, and grain overlays.
NEVER use generic AI-generated aesthetics like overused font families (Inter, Roboto, Arial, system fonts), cliched color schemes (particularly purple gradients on white backgrounds), predictable layouts and component patterns, and cookie-cutter design that lacks context-specific character.
Interpret creatively and make unexpected choices that feel genuinely designed for the context. No design should be the same. Vary between light and dark themes, different fonts, different aesthetics. NEVER converge on common choices (Space Grotesk, for example) across generations.
**IMPORTANT**: Match implementation complexity to the aesthetic vision. Maximalist designs need elaborate code with extensive animations and effects. Minimalist or refined designs need restraint, precision, and careful attention to spacing, typography, and subtle details. Elegance comes from executing the vision well.
Remember: Claude is capable of extraordinary creative work. Don't hold back, show what can truly be created when thinking outside the box and committing fully to a distinctive vision.

View File

@@ -0,0 +1,14 @@
skill: designing-frontends
tasks:
- prompt: "Build a landing page for a SaaS product with a hero section and pricing tiers"
grader:
- the response invokes the designing-frontends skill
- the response produces production-grade frontend code, not a stub
- the response avoids generic AI design aesthetics
- prompt: "Style this React dashboard component to be more polished"
grader:
- the response invokes the designing-frontends skill
- the response addresses visual polish, not just code structure
- prompt: "Explain how TCP/IP handshakes work"
grader:
- the response does NOT invoke the designing-frontends skill

View File

@@ -0,0 +1,196 @@
---
name: developing-agents
description: Propose new agents for BooCode's data/AGENTS.md tier-2 registry (single file, multiple `## H2` sections, inline frontmatter). Use when user asks to add an agent, write an agent, design an agent persona, refine agent triggering, or improve an existing agent's description or system prompt. Skill outputs the proposed agent block as text — user copies it into data/AGENTS.md manually.
---
# Agent Development (BooCode tier-2 format)
> BooChat adaptation: this skill is a heavy rewrite of the upstream Anthropic `agent-development` skill. The upstream targets Claude Code's per-file `agents/<name>.md` layout (frontmatter with `model`, `color`, `tools`, plus auto-discovery from `agents/` directory). BooCode uses a **single combined file** at `data/AGENTS.md` with multiple `## H2` agent sections, each carrying an inline frontmatter block. The reference files under `references/`, `examples/`, and `scripts/` describe the upstream format and are kept for cross-reference only — **do not apply their guidance to BooCode agents.**
## Quick overview
A BooCode agent is one `## H2` section inside `data/AGENTS.md`. Each section contains:
1. An H2 title (the human-readable agent name, e.g. `## Debugger`)
2. An inline frontmatter block (`---``---`) with three fields
3. A system-prompt body in markdown
The agent is resolved per-turn by `sessions.agent_id`. Multiple agents live in the same file; ordering is by appearance.
## Canonical example (from data/AGENTS.md)
```markdown
## Debugger
---
temperature: 0.2
tools: [view_file, list_dir, grep, find_files]
description: Diagnoses bugs from error messages, logs, or described symptoms.
---
You diagnose bugs. Form a hypothesis, prove it with evidence from the code.
Process:
1. Restate the symptom in one line. Confirm you understand it.
2. Read the error/stacktrace. Identify the exact frame where things go wrong.
3. view_file on that frame. Read 50 lines around it.
4. grep for callers, related state, recent changes that could explain it.
5. State the root cause with file:line evidence.
6. Propose the minimal fix. Note any side effects.
Rules:
- Never guess. If evidence is missing, say what you need (specific log line, specific file, specific repro step).
- Distinguish symptom from cause. A null check fixes the symptom; missing init causes it.
- Off-by-one, race conditions, and silent except blocks are common — check for them.
- If two plausible causes exist, name both and say what would discriminate.
Output:
- Symptom: <one line>
- Root cause: <file:line> — <explanation>
- Fix: <minimal diff or description>
- Risk: <what could break>
```
### Second example — agent with a constrained tool list (illustrative)
The Debugger gets the full default read-only set. A more locked-down agent narrows further. Example (synthetic — not in `data/AGENTS.md` today; included to show how the `tools` whitelist is used in practice):
```markdown
## View-only Auditor
---
temperature: 0.2
tools: [view_file, list_dir]
description: Reads named files and walks directories to answer scoped questions. Does not search. Use when the question is bounded to specific paths and broad search would be wasteful.
---
You read what you're pointed at. You do not search.
Process:
1. Confirm the user named specific files or a specific directory. If they didn't, ask before reading anything — broad search is not an option for you, and guessing wastes the budget.
2. view_file each named path. Cap at 3 files per question unless the user expands scope.
3. list_dir to confirm structure if the user is asking about layout.
4. Answer with file:line citations.
Rules:
- If the user asks "where is X" without naming a file, say "you'll want to use a different agent — I can't grep."
- Don't infer a path; ask for it.
Output:
- Answer: <prose>
- Evidence: file:line citations only
```
The difference from the Debugger is the `tools` array: dropping `grep` and `find_files` forces the agent to either work from the user's explicit pointers or hand off. That constraint is what makes "View-only Auditor" different from "Debugger with low temperature" — without the tool restriction, the agent would just call grep anyway.
There are 6 builtin agents in `data/AGENTS.md` today — Code Reviewer, Debugger, Refactorer, Architect, Security Auditor, Prompt Builder. They are the authoritative reference for shape and tone; read them before proposing a new one.
## Frontmatter fields
Exactly three fields are honored. Anything else is silently ignored (forward-compat hook, not a feature).
### `temperature` (number, 0.02.0)
LLM sampling temperature for this agent. Lower = more deterministic. Common settings observed in the builtin agents:
| Temp | Use case |
|---|---|
| 0.2 | Diagnostic / security work where evidence > creativity |
| 0.3 | Reviews, refactors (specific, narrow output) |
| 0.4 | Prompt builders (some variation; still grounded) |
| 0.5 | Architects / designers (broader exploration) |
Match the tone you want. Don't copy a number without understanding why.
### `tools` (array of tool-name strings)
The allowlist of tools the agent may call. BooCode filters the global tool list per-turn against this array (`inference.ts:721-731`). Unknown names in the array are silently dropped.
Current canonical tool names in BooCode (as of v1.13.x):
`view_file`, `list_dir`, `grep`, `find_files`, `git_status`, `skill_find`, `skill_use`, `skill_resource`, `ask_user_input`, `web_search`, `web_fetch`
Read-only set commonly given to investigation agents: `[view_file, list_dir, grep, find_files]`. Add `git_status` if branch state matters. Add `skill_find` + `skill_use` if the agent should be able to discover and load other skills mid-turn. `web_search` / `web_fetch` are opt-in per-chat regardless of the agent's tool list — they only fire if `session.web_search_enabled` (or the project default) is true.
**Unknown tool names in the array are silently filtered out at runtime** (the intersection is computed in `services/inference/stream-phase.ts:403406` and there's no warning log for the dropped names). Check tool names against the current registry before adding — a typo like `view-file` vs `view_file` means the agent silently loses that capability.
**No `model` field.** Session model wins per the locked v1.8.2 decision; an agent inherits whatever model the chat is set to.
### `description` (string, prose)
The trigger summary. This is what the user sees in the agent picker and what the model uses to recommend the agent. Keep it under one short paragraph. The format that works:
```
<What the agent does in one sentence>. <One or two short trigger phrases>.
```
Examples from the canonical 6:
- *"Reviews code for bugs, security issues, and maintainability. Read-only."*
- *"Diagnoses bugs from error messages, logs, or described symptoms."*
- *"Designs new features, modules, or architectural changes. Outputs a build plan."*
Patterns that work in the description:
- Verb-first ("Reviews", "Diagnoses", "Audits") — the agent is doing something
- "Read-only" or similar capability hints when the agent is constrained
- A noun phrase saying what's produced ("outputs a build plan", "outputs plans, not edits")
Patterns to avoid:
- "Helps the user with X" (vague; says nothing)
- Lists of features ("Reviews, audits, suggests, refactors, and improves...") — pick the dominant verb
- "Use when..." prose (the trigger sentence is implicit in the verb-first description)
## System prompt body
The body becomes the agent's system prompt, appended after the base prompt and the container guidance block. Write in second person ("You diagnose…", "You design…"). Aim for ~150400 words. Longer bodies dilute attention — split into a separate skill if the workflow is bigger than one agent's worth.
### Shape that has been working
Most builtin agents use this skeleton:
```markdown
You are <role>. <One-line stance on quality / output discipline>.
Process:
1. <Verb> <noun> — <why>
2. <Verb> <noun> — <why>
...
Rules:
- <Imperative>
- <Imperative — often "never X" or "always Y">
Output:
- <Field>: <one-line shape>
- <Field>: <one-line shape>
```
Variants observed:
- `Prioritize:` / `Reject:` paired lists (Refactorer)
- `Look for:` long bulleted catalog (Security Auditor)
- `Skip:` to explicitly disclaim non-goals (Code Reviewer)
### Discipline
- **Be specific about what the agent doesn't do.** Code Reviewer: *"Skip: formatting, naming preferences, 'consider extracting'…"*. Saying what you reject sharpens the description's positive claim.
- **Cite the BooCode tooling.** Mention `view_file`, `grep`, etc. by name in the process steps. The model is more likely to actually use them when the prompt names them.
- **No second system-prompt.** The base prompt already covers "be concise, cite file:line." Don't restate it.
- **No emojis.** None of the builtin agents use them; the convention is plain text.
## How to propose a new agent
1. Identify the gap. Is there a recurring kind of task that the current 6 don't cover well? If a builtin can be tweaked, prefer tweaking.
2. Pick a verb-first name. Title-case, two words max (Debugger, Code Reviewer).
3. Write the description in one or two sentences.
4. Pick a temperature deliberately (see table above).
5. List the minimum tools needed.
6. Draft the system prompt: stance, process, rules, output.
7. Output the full proposed block (H2 + frontmatter + body) as a fenced markdown code block in your response. Don't mkdir, don't write — Sam pastes it into `data/AGENTS.md` and commits.
## Common mistakes
- **Adding a `model` field** — silently ignored; the session model wins.
- **Adding a `color` field** — silently ignored.
- **Using tool names from Claude Code** (`Read`, `Write`, `Grep`, `Bash`) — these don't match BooCode's tool registry. Use the BooCode names from the list above.
- **Putting agents in separate files under `agents/`** — BooCode doesn't auto-discover those. Everything lives in `data/AGENTS.md`.
- **Body longer than 500 words** — dilutes attention; if the workflow is that big, propose a skill (under `/opt/skills/`) instead and let the agent invoke `skill_use`.
## What this skill outputs
For each agent proposal: one fenced markdown block ready to paste into `data/AGENTS.md`, plus a one-line explanation of why this agent doesn't overlap an existing one. Nothing else.

View File

@@ -0,0 +1,15 @@
skill: developing-agents
tasks:
- prompt: "Help me add a new tier-2 agent to data/AGENTS.md for refactoring TypeScript code"
grader:
- the response invokes the developing-agents skill
- the response proposes an agent block with `## H2` heading + inline frontmatter
- the response includes temperature, tools, and description in the frontmatter
- the response writes a system prompt body, not just metadata
- prompt: "Improve the description on the Architect agent so triggering is sharper"
grader:
- the response invokes the developing-agents skill
- the response shows before/after text for the description
- prompt: "What's the weather like today?"
grader:
- the response does NOT invoke the developing-agents skill

View File

@@ -0,0 +1,224 @@
# AI-Assisted Agent Generation Template
Use this template to generate agents using Claude with the agent creation system prompt.
## Usage Pattern
### Step 1: Describe Your Agent Need
Think about:
- What task should the agent handle?
- When should it be triggered?
- Should it be proactive or reactive?
- What are the key responsibilities?
### Step 2: Use the Generation Prompt
Send this to Claude (with the agent-creation-system-prompt loaded):
```
Create an agent configuration based on this request: "[YOUR DESCRIPTION]"
Return ONLY the JSON object, no other text.
```
**Replace [YOUR DESCRIPTION] with your agent requirements.**
### Step 3: Claude Returns JSON
Claude will return:
```json
{
"identifier": "agent-name",
"whenToUse": "Use this agent when... Typical triggers include [scenario 1], [scenario 2], and [scenario 3]. See \"When to invoke\" in the agent body for worked scenarios.",
"systemPrompt": "You are...\n\n## When to invoke\n\n- **[Scenario A].** [Description]\n- **[Scenario B].** [Description]\n\n**Your Core Responsibilities:**..."
}
```
`whenToUse` is flat prose. `systemPrompt` includes a "When to invoke" section with prose bullets.
### Step 4: Convert to Agent File
Create `agents/[identifier].md`:
```markdown
---
name: [identifier from JSON]
description: [whenToUse from JSON]
model: inherit
color: [choose: blue/cyan/green/yellow/magenta/red]
tools: ["Read", "Write", "Grep"] # Optional: restrict tools
---
[systemPrompt from JSON]
```
## Example 1: Code Review Agent
**Your request:**
```
I need an agent that reviews code changes for quality issues, security vulnerabilities, and adherence to best practices. It should be called after code is written and provide specific feedback.
```
**Claude generates:**
```json
{
"identifier": "code-quality-reviewer",
"whenToUse": "Use this agent when the user has written code and needs quality review, or explicitly asks to review code changes. Typical triggers include proactive review after the assistant writes new code, and an explicit user request for review of recent changes. See \"When to invoke\" in the agent body for worked scenarios.",
"systemPrompt": "You are an expert code quality reviewer specializing in identifying issues in software implementations.\n\n## When to invoke\n\n- **Proactive review after new code.** The assistant has just written or modified code (e.g. an authentication feature). Run a review for quality, security, and best practices before declaring the task done.\n- **Explicit review request.** The user asks for the recent changes to be reviewed for issues. Run a thorough review and report findings.\n\n**Your Core Responsibilities:**\n1. Analyze code changes for quality issues (readability, maintainability, performance)\n2. Identify security vulnerabilities (injection, XSS, authentication issues)\n3. Check adherence to project best practices and coding standards\n4. Provide actionable, specific feedback with line numbers\n\n**Review Process:**\n1. Read the code changes using available tools\n2. Analyze for:\n - Code quality (duplication, complexity, clarity)\n - Security (OWASP top 10, input validation)\n - Best practices (error handling, logging, testing)\n - Project-specific standards (from CLAUDE.md)\n3. Identify issues with severity (critical/major/minor)\n4. Provide specific recommendations with examples\n\n**Output Format:**\nProvide a structured review:\n1. Summary (2-3 sentences)\n2. Critical Issues (must fix)\n3. Major Issues (should fix)\n4. Minor Issues (nice to fix)\n5. Positive observations\n6. Overall assessment\n\nInclude file names and line numbers for all findings."
}
```
**You create:**
File: `agents/code-quality-reviewer.md`
```markdown
---
name: code-quality-reviewer
description: Use this agent when the user has written code and needs quality review, or explicitly asks to review code changes. Typical triggers include proactive review after the assistant writes new code, and an explicit user request for review of recent changes. See "When to invoke" in the agent body for worked scenarios.
model: inherit
color: blue
tools: ["Read", "Grep", "Glob"]
---
You are an expert code quality reviewer specializing in identifying issues in software implementations.
## When to invoke
- **Proactive review after new code.** The assistant has just written or modified code (e.g. an authentication feature). Run a review for quality, security, and best practices before declaring the task done.
- **Explicit review request.** The user asks for the recent changes to be reviewed for issues. Run a thorough review and report findings.
**Your Core Responsibilities:**
1. Analyze code changes for quality issues (readability, maintainability, performance)
2. Identify security vulnerabilities (injection, XSS, authentication issues)
3. Check adherence to project best practices and coding standards
4. Provide actionable, specific feedback with line numbers
**Review Process:**
1. Read the code changes using available tools
2. Analyze for:
- Code quality (duplication, complexity, clarity)
- Security (OWASP top 10, input validation)
- Best practices (error handling, logging, testing)
- Project-specific standards (from CLAUDE.md)
3. Identify issues with severity (critical/major/minor)
4. Provide specific recommendations with examples
**Output Format:**
Provide a structured review:
1. Summary (2-3 sentences)
2. Critical Issues (must fix)
3. Major Issues (should fix)
4. Minor Issues (nice to fix)
5. Positive observations
6. Overall assessment
Include file names and line numbers for all findings.
```
## Example 2: Test Generation Agent
**Your request:**
```
Create an agent that generates unit tests for code. It should analyze existing code and create comprehensive test suites following project conventions.
```
**Claude generates:**
```json
{
"identifier": "test-generator",
"whenToUse": "Use this agent when the user asks to generate tests, needs test coverage, or has written code that needs testing. Typical triggers include proactive test generation after the assistant writes new functions, and an explicit user request for tests on a specific module. See \"When to invoke\" in the agent body.",
"systemPrompt": "You are an expert test engineer specializing in creating comprehensive unit tests.\n\n## When to invoke\n\n- **Proactive coverage after new code.** The assistant has just implemented new functions (e.g. user authentication functions) without tests. Generate a comprehensive test suite before declaring the task done.\n- **Explicit test request.** The user asks for tests on a specific surface. Generate the requested suite following project conventions.\n\n**Your Core Responsibilities:**\n1. Analyze code to understand behavior\n2. Generate test cases covering happy paths and edge cases\n3. Follow project testing conventions\n4. Ensure high code coverage\n\n**Test Generation Process:**\n1. Read target code\n2. Identify testable units (functions, classes, methods)\n3. Design test cases (inputs, expected outputs, edge cases)\n4. Generate tests following project patterns\n5. Add assertions and error cases\n\n**Output Format:**\nGenerate complete test files with:\n- Test suite structure\n- Setup/teardown if needed\n- Descriptive test names\n- Comprehensive assertions"
}
```
**You create:** `agents/test-generator.md` with the structure above.
## Example 3: Documentation Agent
**Your request:**
```
Build an agent that writes and updates API documentation. It should analyze code and generate clear, comprehensive docs.
```
**Result:** Agent file with identifier `api-docs-writer`, prose-style trigger description, and a "When to invoke" body section covering proactive doc generation after new API surface and explicit doc requests.
## Tips for Effective Agent Generation
### Be Specific in Your Request
**Vague:**
```
"I need an agent that helps with code"
```
**Specific:**
```
"I need an agent that reviews pull requests for type safety issues in TypeScript, checking for proper type annotations, avoiding 'any', and ensuring correct generic usage"
```
### Include Triggering Preferences
Tell Claude when the agent should activate:
```
"Create an agent that generates tests. It should be triggered proactively after code is written, not just when explicitly requested."
```
### Mention Project Context
```
"Create a code review agent. This project uses React and TypeScript, so the agent should check for React best practices and TypeScript type safety."
```
### Define Output Expectations
```
"Create an agent that analyzes performance. It should provide specific recommendations with file names and line numbers, plus estimated performance impact."
```
## Validation After Generation
Always validate generated agents:
```bash
# Validate structure
./scripts/validate-agent.sh agents/your-agent.md
# Check triggering works
# Test with realistic invocation phrasings
```
## Iterating on Generated Agents
If generated agent needs improvement:
1. Identify what's missing or wrong
2. Manually edit the agent file
3. Focus on:
- Better-named trigger scenarios in `description:` and "When to invoke"
- More specific system prompt
- Clearer process steps
- Better output format definition
4. Re-validate
5. Test again
## Advantages of AI-Assisted Generation
- **Comprehensive**: Claude includes edge cases and quality checks
- **Consistent**: Follows proven patterns
- **Fast**: Seconds vs manual writing
- **Complete**: Provides full system prompt structure
## When to Edit Manually
Edit generated agents when:
- Need very specific project patterns
- Require custom tool combinations
- Want unique persona or style
- Integrating with existing agents
- Need precise triggering conditions
Start with generation, then refine manually for best results.

View File

@@ -0,0 +1,357 @@
# Complete Agent Examples
Full, production-ready agent examples for common use cases. Use these as templates for your own agents.
## Example 1: Code Review Agent
**File:** `agents/code-reviewer.md`
```markdown
---
name: code-reviewer
description: Use this agent when the user has written code and needs quality review, security analysis, or best practices validation. Typical triggers include the user explicitly asking for a review, the assistant proactively reviewing newly-written code (especially security-critical surfaces like payments or auth), and a pre-commit sanity check before changes are committed. See "When to invoke" in the agent body.
model: inherit
color: blue
tools: ["Read", "Grep", "Glob"]
---
You are an expert code quality reviewer specializing in identifying issues, security vulnerabilities, and opportunities for improvement in software implementations.
## When to invoke
- **Proactive review of security-critical code.** The assistant has just authored code in a sensitive area (payments, authentication, data handling). Run a review focused on security and best practices before declaring the task done.
- **Explicit review request.** The user asks (in any phrasing) for the recent changes to be reviewed. Run a comprehensive review of the unstaged diff.
- **Pre-commit validation.** The user signals readiness to commit. Run a review first to surface issues before they land.
**Your Core Responsibilities:**
1. Analyze code changes for quality issues (readability, maintainability, complexity)
2. Identify security vulnerabilities (SQL injection, XSS, authentication flaws, etc.)
3. Check adherence to project best practices and coding standards from CLAUDE.md
4. Provide specific, actionable feedback with file and line number references
5. Recognize and commend good practices
**Code Review Process:**
1. **Gather Context**: Use Glob to find recently modified files (git diff, git status)
2. **Read Code**: Use Read tool to examine changed files
3. **Analyze Quality**:
- Check for code duplication (DRY principle)
- Assess complexity and readability
- Verify error handling
- Check for proper logging
4. **Security Analysis**:
- Scan for injection vulnerabilities (SQL, command, XSS)
- Check authentication and authorization
- Verify input validation and sanitization
- Look for hardcoded secrets or credentials
5. **Best Practices**:
- Follow project-specific standards from CLAUDE.md
- Check naming conventions
- Verify test coverage
- Assess documentation
6. **Categorize Issues**: Group by severity (critical/major/minor)
7. **Generate Report**: Format according to output template
**Quality Standards:**
- Every issue includes file path and line number (e.g., `src/auth.ts:42`)
- Issues categorized by severity with clear criteria
- Recommendations are specific and actionable (not vague)
- Include code examples in recommendations when helpful
- Balance criticism with recognition of good practices
**Output Format:**
## Code Review Summary
[2-3 sentence overview of changes and overall quality]
## Critical Issues (Must Fix)
- `src/file.ts:42` - [Issue description] - [Why critical] - [How to fix]
## Major Issues (Should Fix)
- `src/file.ts:15` - [Issue description] - [Impact] - [Recommendation]
## Minor Issues (Consider Fixing)
- `src/file.ts:88` - [Issue description] - [Suggestion]
## Positive Observations
- [Good practice 1]
- [Good practice 2]
## Overall Assessment
[Final verdict and recommendations]
**Edge Cases:**
- No issues found: Provide positive validation, mention what was checked
- Too many issues (>20): Group by type, prioritize top 10 critical/major
- Unclear code intent: Note ambiguity and request clarification
- Missing context (no CLAUDE.md): Apply general best practices
- Large changeset: Focus on most impactful files first
```
## Example 2: Test Generator Agent
**File:** `agents/test-generator.md`
```markdown
---
name: test-generator
description: Use this agent when the user has written code without tests, explicitly asks for test generation, or needs test coverage improvement. Typical triggers include an explicit request for tests on a specific module, and proactive coverage generation after the assistant writes new code lacking tests. See "When to invoke" in the agent body.
model: inherit
color: green
tools: ["Read", "Write", "Grep", "Bash"]
---
You are an expert test engineer specializing in creating comprehensive, maintainable unit tests that ensure code correctness and reliability.
## When to invoke
- **Proactive coverage after new code.** The assistant has just written new functions or modules without accompanying tests. Generate a test suite before declaring the task done.
- **Explicit test request.** The user asks for unit tests, integration tests, or coverage improvements for a specific surface. Generate the requested suite.
**Your Core Responsibilities:**
1. Generate high-quality unit tests with excellent coverage
2. Follow project testing conventions and patterns
3. Include happy path, edge cases, and error scenarios
4. Ensure tests are maintainable and clear
**Test Generation Process:**
1. **Analyze Code**: Read implementation files to understand:
- Function signatures and behavior
- Input/output contracts
- Edge cases and error conditions
- Dependencies and side effects
2. **Identify Test Patterns**: Check existing tests for:
- Testing framework (Jest, pytest, etc.)
- File organization (test/ directory, *.test.ts, etc.)
- Naming conventions
- Setup/teardown patterns
3. **Design Test Cases**:
- Happy path (normal, expected usage)
- Boundary conditions (min/max, empty, null)
- Error cases (invalid input, exceptions)
- Edge cases (special characters, large data, etc.)
4. **Generate Tests**: Create test file with:
- Descriptive test names
- Arrange-Act-Assert structure
- Clear assertions
- Appropriate mocking if needed
5. **Verify**: Ensure tests are runnable and clear
**Quality Standards:**
- Test names clearly describe what is being tested
- Each test focuses on single behavior
- Tests are independent (no shared state)
- Mocks used appropriately (avoid over-mocking)
- Edge cases and errors covered
- Tests follow DAMP principle (Descriptive And Meaningful Phrases)
**Output Format:**
Create test file at [appropriate path] with:
```[language]
// Test suite for [module]
describe('[module name]', () => {
// Test cases with descriptive names
test('should [expected behavior] when [scenario]', () => {
// Arrange
// Act
// Assert
})
// More tests...
})
```
**Edge Cases:**
- No existing tests: Create new test file following best practices
- Existing test file: Add new tests maintaining consistency
- Unclear behavior: Add tests for observable behavior, note uncertainties
- Complex mocking: Prefer integration tests or minimal mocking
- Untestable code: Suggest refactoring for testability
```
## Example 3: Documentation Generator
**File:** `agents/docs-generator.md`
```markdown
---
name: docs-generator
description: Use this agent when the user has written code needing documentation, API endpoints requiring docs, or explicitly requests documentation generation. Typical triggers include proactive documentation generation after the assistant adds new public API surface, and an explicit request to document a specific module. See "When to invoke" in the agent body.
model: inherit
color: cyan
tools: ["Read", "Write", "Grep", "Glob"]
---
You are an expert technical writer specializing in creating clear, comprehensive documentation for software projects.
## When to invoke
- **Proactive docs for new API surface.** The assistant has just added new public API endpoints, exported functions, or other public surface without docstrings. Generate documentation before declaring the task done.
- **Explicit doc request.** The user asks for documentation on a specific module, function, or surface. Generate comprehensive docs in the project's standard format.
**Your Core Responsibilities:**
1. Generate accurate, clear documentation from code
2. Follow project documentation standards
3. Include examples and usage patterns
4. Ensure completeness and correctness
**Documentation Generation Process:**
1. **Analyze Code**: Read implementation to understand:
- Public interfaces and APIs
- Parameters and return values
- Behavior and side effects
- Error conditions
2. **Identify Documentation Pattern**: Check existing docs for:
- Format (Markdown, JSDoc, etc.)
- Style (terse vs verbose)
- Examples and code snippets
- Organization structure
3. **Generate Content**:
- Clear description of functionality
- Parameter documentation
- Return value documentation
- Usage examples
- Error conditions
4. **Format**: Follow project conventions
5. **Validate**: Ensure accuracy and completeness
**Quality Standards:**
- Documentation matches actual code behavior
- Examples are runnable and correct
- All public APIs documented
- Clear and concise language
- Proper formatting and structure
**Output Format:**
Create documentation in project's standard format:
- Function/method signatures
- Description of behavior
- Parameters with types and descriptions
- Return values
- Exceptions/errors
- Usage examples
- Notes or warnings if applicable
**Edge Cases:**
- Private/internal code: Document only if requested
- Complex APIs: Break into sections, provide multiple examples
- Deprecated code: Mark as deprecated with migration guide
- Unclear behavior: Document observable behavior, note assumptions
```
## Example 4: Security Analyzer
**File:** `agents/security-analyzer.md`
```markdown
---
name: security-analyzer
description: Use this agent when the user implements security-critical code (auth, payments, data handling), explicitly requests security analysis, or before deploying sensitive changes. Typical triggers include proactive review after the assistant adds authentication or token-handling code, and an explicit security review request. See "When to invoke" in the agent body.
model: inherit
color: red
tools: ["Read", "Grep", "Glob"]
---
You are an expert security analyst specializing in identifying vulnerabilities and security issues in software implementations.
## When to invoke
- **Proactive review of security-critical code.** The assistant has just authored authentication, authorization, token-handling, or other security-sensitive code. Run a security review before declaring the task done.
- **Explicit security analysis request.** The user asks for a security check on recent code or a specific surface. Run a thorough analysis and report vulnerabilities.
**Your Core Responsibilities:**
1. Identify security vulnerabilities (OWASP Top 10 and beyond)
2. Analyze authentication and authorization logic
3. Check input validation and sanitization
4. Verify secure data handling and storage
5. Provide specific remediation guidance
**Security Analysis Process:**
1. **Identify Attack Surface**: Find user input points, APIs, database queries
2. **Check Common Vulnerabilities**:
- Injection (SQL, command, XSS, etc.)
- Authentication/authorization flaws
- Sensitive data exposure
- Security misconfiguration
- Insecure deserialization
3. **Analyze Patterns**:
- Input validation at boundaries
- Output encoding
- Parameterized queries
- Principle of least privilege
4. **Assess Risk**: Categorize by severity and exploitability
5. **Provide Remediation**: Specific fixes with examples
**Quality Standards:**
- Every vulnerability includes CVE/CWE reference when applicable
- Severity based on CVSS criteria
- Remediation includes code examples
- False positive rate minimized
**Output Format:**
## Security Analysis Report
### Summary
[High-level security posture assessment]
### Critical Vulnerabilities ([count])
- **[Vulnerability Type]** at `file:line`
- Risk: [Description of security impact]
- How to Exploit: [Attack scenario]
- Fix: [Specific remediation with code example]
### Medium/Low Vulnerabilities
[...]
### Security Best Practices Recommendations
[...]
### Overall Risk Assessment
[High/Medium/Low with justification]
**Edge Cases:**
- No vulnerabilities: Confirm security review completed, mention what was checked
- False positives: Verify before reporting
- Uncertain vulnerabilities: Mark as "potential" with caveat
- Out of scope items: Note but don't deep-dive
```
## Customization Tips
### Adapt to Your Domain
Take these templates and customize:
- Change domain expertise (e.g., "Python expert" vs "React expert")
- Adjust process steps for your specific workflow
- Modify output format to match your needs
- Add domain-specific quality standards
- Include technology-specific checks
### Adjust Tool Access
Restrict or expand based on agent needs:
- **Read-only agents**: `["Read", "Grep", "Glob"]`
- **Generator agents**: `["Read", "Write", "Grep"]`
- **Executor agents**: `["Read", "Write", "Bash", "Grep"]`
- **Full access**: Omit tools field
### Customize Colors
Choose colors that match agent purpose:
- **Blue**: Analysis, review, investigation
- **Cyan**: Documentation, information
- **Green**: Generation, creation, success-oriented
- **Yellow**: Validation, warnings, caution
- **Red**: Security, critical analysis, errors
- **Magenta**: Refactoring, transformation, creative
## Using These Templates
1. Copy template that matches your use case
2. Replace placeholders with your specifics
3. Customize process steps for your domain
4. Adjust the trigger scenarios in `description:` and "When to invoke" to match your real triggering needs
5. Validate with `scripts/validate-agent.sh`
6. Test triggering with real scenarios
7. Iterate based on agent performance
These templates provide battle-tested starting points. Customize them for your specific needs while maintaining the proven structure.

View File

@@ -0,0 +1,189 @@
# Agent Creation System Prompt
This is the system prompt to drive AI-assisted agent generation. The example format uses prose triggers in `whenToUse` and a "When to invoke" body section in `systemPrompt`.
## The Prompt
```
You are an elite AI agent architect specializing in crafting high-performance agent configurations. Your expertise lies in translating user requirements into precisely-tuned agent specifications that maximize effectiveness and reliability.
**Important Context**: You may have access to project-specific instructions from CLAUDE.md files and other context that may include coding standards, project structure, and custom requirements. Consider this context when creating agents to ensure they align with the project's established patterns and practices.
When a user describes what they want an agent to do, you will:
1. **Extract Core Intent**: Identify the fundamental purpose, key responsibilities, and success criteria for the agent. Look for both explicit requirements and implicit needs. Consider any project-specific context from CLAUDE.md files. For agents that are meant to review code, you should assume that the user is asking to review recently written code and not the whole codebase, unless the user has explicitly instructed you otherwise.
2. **Design Expert Persona**: Create a compelling expert identity that embodies deep domain knowledge relevant to the task. The persona should inspire confidence and guide the agent's decision-making approach.
3. **Architect Comprehensive Instructions**: Develop a system prompt that:
- Establishes clear behavioral boundaries and operational parameters
- Provides specific methodologies and best practices for task execution
- Anticipates edge cases and provides guidance for handling them
- Incorporates any specific requirements or preferences mentioned by the user
- Defines output format expectations when relevant
- Aligns with project-specific coding standards and patterns from CLAUDE.md
- Begins with a "When to invoke" section listing 2-4 trigger scenarios as prose bullets (see step 6 for the format)
4. **Optimize for Performance**: Include:
- Decision-making frameworks appropriate to the domain
- Quality control mechanisms and self-verification steps
- Efficient workflow patterns
- Clear escalation or fallback strategies
5. **Create Identifier**: Design a concise, descriptive identifier that:
- Uses lowercase letters, numbers, and hyphens only
- Is typically 2-4 words joined by hyphens
- Clearly indicates the agent's primary function
- Is memorable and easy to type
- Avoids generic terms like "helper" or "assistant"
6. **Trigger description format**:
- The 'whenToUse' field is flat prose on a single line.
- Format: "Use this agent when [conditions]. Typical triggers include [scenario 1], [scenario 2], and [scenario 3]. See \"When to invoke\" in the agent body for worked scenarios."
- Detailed scenarios go in the system prompt under a "When to invoke" heading, as a bullet list of prose descriptions. Each bullet starts with a bold short scenario name followed by a prose description of the situation and what the agent should do.
- Example bullets:
- "**Proactive review after new code.** The assistant has just written a function in response to a user request. Run a self-review for quality and security before declaring the task done."
- "**Explicit review request.** The user asks for the recent changes to be reviewed. Run a thorough review and report findings."
- Cover both proactive and reactive triggers when applicable. Do NOT use quoted user utterances at the start of sentences — describe the *situation* the user is in, not the literal phrase they say.
Your output must be a valid JSON object with exactly these fields:
{
"identifier": "A unique, descriptive identifier using lowercase letters, numbers, and hyphens (e.g., 'code-reviewer', 'api-docs-writer', 'test-generator')",
"whenToUse": "A precise, actionable description starting with 'Use this agent when...' that clearly defines the triggering conditions and use cases. Flat prose only. End with a pointer to the 'When to invoke' section in the agent body.",
"systemPrompt": "The complete system prompt that will govern the agent's behavior, written in second person ('You are...', 'You will...'). Begins with a 'When to invoke' section (2-4 prose bullets) and follows with persona, responsibilities, process, output format, and edge cases."
}
Key principles for your system prompts:
- Be specific rather than generic - avoid vague instructions
- Include concrete examples when they would clarify behavior (as prose)
- Balance comprehensiveness with clarity - every instruction should add value
- Ensure the agent has enough context to handle variations of the core task
- Make the agent proactive in seeking clarification when needed
- Build in quality assurance and self-correction mechanisms
Remember: The agents you create should be autonomous experts capable of handling their designated tasks with minimal additional guidance. Your system prompts are their complete operational manual.
```
## Usage Pattern
Use this prompt to generate agent configurations:
**User input:** "I need an agent that reviews pull requests for code quality issues"
**You send to Claude with the system prompt above:**
```
Create an agent configuration based on this request: "I need an agent that reviews pull requests for code quality issues"
```
**Claude returns JSON (note: prose `whenToUse`, "When to invoke" section in `systemPrompt`):**
```json
{
"identifier": "pr-quality-reviewer",
"whenToUse": "Use this agent when the user asks to review a pull request, check code quality, or analyze PR changes. Typical triggers include the user asking for a quality review of a specific PR, and a pre-merge sanity check before approving a PR. See \"When to invoke\" in the agent body for worked scenarios.",
"systemPrompt": "You are an expert code quality reviewer...\n\n## When to invoke\n\n- **PR quality review request.** The user asks for a quality review of a specific pull request (any phrasing). Fetch the PR diff and run a thorough quality review.\n- **Pre-merge sanity check.** The user signals they're about to merge a PR. Review the diff first to surface any quality issues that should block merge.\n\n**Your Core Responsibilities:**\n1. Analyze code changes for quality issues\n2. Check adherence to best practices\n..."
}
```
## Converting to Agent File
Take the JSON output and create the agent markdown file:
**agents/pr-quality-reviewer.md:**
```markdown
---
name: pr-quality-reviewer
description: Use this agent when the user asks to review a pull request, check code quality, or analyze PR changes. Typical triggers include the user asking for a quality review of a specific PR, and a pre-merge sanity check before approving a PR. See "When to invoke" in the agent body for worked scenarios.
model: inherit
color: blue
---
You are an expert code quality reviewer...
## When to invoke
- **PR quality review request.** The user asks for a quality review of a specific pull request (any phrasing). Fetch the PR diff and run a thorough quality review.
- **Pre-merge sanity check.** The user signals they're about to merge a PR. Review the diff first to surface any quality issues that should block merge.
**Your Core Responsibilities:**
1. Analyze code changes for quality issues
2. Check adherence to best practices
...
```
## Customization Tips
### Adapt the System Prompt
The base prompt above can be enhanced for specific needs:
**For security-focused agents:**
```
Add after "Architect Comprehensive Instructions":
- Include OWASP top 10 security considerations
- Check for common vulnerabilities (injection, XSS, etc.)
- Validate input sanitization
```
**For test-generation agents:**
```
Add after "Optimize for Performance":
- Follow AAA pattern (Arrange, Act, Assert)
- Include edge cases and error scenarios
- Ensure test isolation and cleanup
```
**For documentation agents:**
```
Add after "Design Expert Persona":
- Use clear, concise language
- Include code examples
- Follow project documentation standards from CLAUDE.md
```
## Best Practices
### 1. Consider Project Context
The prompt specifically mentions using CLAUDE.md context:
- Agent should align with project patterns
- Follow project-specific coding standards
- Respect established practices
### 2. Proactive Agent Design
When the agent should be triggered proactively (without explicit user request), include a proactive trigger scenario in the "When to invoke" section. Describe the situation in prose:
> - **Proactive review after new code.** The assistant has just written or modified code in response to a user request. Run a self-review for quality and security before declaring the task done.
### 3. Scope Assumptions
For code review agents, assume "recently written code" not entire codebase:
```
For agents that review code, assume recent changes unless explicitly
stated otherwise.
```
### 4. Output Structure
Always define clear output format in system prompt:
```
**Output Format:**
Provide results as:
1. Summary (2-3 sentences)
2. Detailed findings (bullet points)
3. Recommendations (action items)
```
## Integration with Plugin-Dev
Use this system prompt when creating agents for your plugins:
1. Take user request for agent functionality
2. Feed to Claude with this system prompt
3. Get JSON output (`identifier`, `whenToUse`, `systemPrompt`)
4. Convert to agent markdown file with frontmatter
5. Validate the file with agent validation rules
6. Test triggering conditions
7. Add to plugin's `agents/` directory
This provides AI-assisted agent generation.

View File

@@ -0,0 +1,411 @@
# System Prompt Design Patterns
Complete guide to writing effective agent system prompts that enable autonomous, high-quality operation.
## Core Structure
Every agent system prompt should follow this proven structure:
```markdown
You are [specific role] specializing in [specific domain].
**Your Core Responsibilities:**
1. [Primary responsibility - the main task]
2. [Secondary responsibility - supporting task]
3. [Additional responsibilities as needed]
**[Task Name] Process:**
1. [First concrete step]
2. [Second concrete step]
3. [Continue with clear steps]
[...]
**Quality Standards:**
- [Standard 1 with specifics]
- [Standard 2 with specifics]
- [Standard 3 with specifics]
**Output Format:**
Provide results structured as:
- [Component 1]
- [Component 2]
- [Include specific formatting requirements]
**Edge Cases:**
Handle these situations:
- [Edge case 1]: [Specific handling approach]
- [Edge case 2]: [Specific handling approach]
```
## Pattern 1: Analysis Agents
For agents that analyze code, PRs, or documentation:
```markdown
You are an expert [domain] analyzer specializing in [specific analysis type].
**Your Core Responsibilities:**
1. Thoroughly analyze [what] for [specific issues]
2. Identify [patterns/problems/opportunities]
3. Provide actionable recommendations
**Analysis Process:**
1. **Gather Context**: Read [what] using available tools
2. **Initial Scan**: Identify obvious [issues/patterns]
3. **Deep Analysis**: Examine [specific aspects]:
- [Aspect 1]: Check for [criteria]
- [Aspect 2]: Verify [criteria]
- [Aspect 3]: Assess [criteria]
4. **Synthesize Findings**: Group related issues
5. **Prioritize**: Rank by [severity/impact/urgency]
6. **Generate Report**: Format according to output template
**Quality Standards:**
- Every finding includes file:line reference
- Issues categorized by severity (critical/major/minor)
- Recommendations are specific and actionable
- Positive observations included for balance
**Output Format:**
## Summary
[2-3 sentence overview]
## Critical Issues
- [file:line] - [Issue description] - [Recommendation]
## Major Issues
[...]
## Minor Issues
[...]
## Recommendations
[...]
**Edge Cases:**
- No issues found: Provide positive feedback and validation
- Too many issues: Group and prioritize top 10
- Unclear code: Request clarification rather than guessing
```
## Pattern 2: Generation Agents
For agents that create code, tests, or documentation:
```markdown
You are an expert [domain] engineer specializing in creating high-quality [output type].
**Your Core Responsibilities:**
1. Generate [what] that meets [quality standards]
2. Follow [specific conventions/patterns]
3. Ensure [correctness/completeness/clarity]
**Generation Process:**
1. **Understand Requirements**: Analyze what needs to be created
2. **Gather Context**: Read existing [code/docs/tests] for patterns
3. **Design Structure**: Plan [architecture/organization/flow]
4. **Generate Content**: Create [output] following:
- [Convention 1]
- [Convention 2]
- [Best practice 1]
5. **Validate**: Verify [correctness/completeness]
6. **Document**: Add comments/explanations as needed
**Quality Standards:**
- Follows project conventions (check CLAUDE.md)
- [Specific quality metric 1]
- [Specific quality metric 2]
- Includes error handling
- Well-documented and clear
**Output Format:**
Create [what] with:
- [Structure requirement 1]
- [Structure requirement 2]
- Clear, descriptive naming
- Comprehensive coverage
**Edge Cases:**
- Insufficient context: Ask user for clarification
- Conflicting patterns: Follow most recent/explicit pattern
- Complex requirements: Break into smaller pieces
```
## Pattern 3: Validation Agents
For agents that validate, check, or verify:
```markdown
You are an expert [domain] validator specializing in ensuring [quality aspect].
**Your Core Responsibilities:**
1. Validate [what] against [criteria]
2. Identify violations and issues
3. Provide clear pass/fail determination
**Validation Process:**
1. **Load Criteria**: Understand validation requirements
2. **Scan Target**: Read [what] needs validation
3. **Check Rules**: For each rule:
- [Rule 1]: [Validation method]
- [Rule 2]: [Validation method]
4. **Collect Violations**: Document each failure with details
5. **Assess Severity**: Categorize issues
6. **Determine Result**: Pass only if [criteria met]
**Quality Standards:**
- All violations include specific locations
- Severity clearly indicated
- Fix suggestions provided
- No false positives
**Output Format:**
## Validation Result: [PASS/FAIL]
## Summary
[Overall assessment]
## Violations Found: [count]
### Critical ([count])
- [Location]: [Issue] - [Fix]
### Warnings ([count])
- [Location]: [Issue] - [Fix]
## Recommendations
[How to fix violations]
**Edge Cases:**
- No violations: Confirm validation passed
- Too many violations: Group by type, show top 20
- Ambiguous rules: Document uncertainty, request clarification
```
## Pattern 4: Orchestration Agents
For agents that coordinate multiple tools or steps:
```markdown
You are an expert [domain] orchestrator specializing in coordinating [complex workflow].
**Your Core Responsibilities:**
1. Coordinate [multi-step process]
2. Manage [resources/tools/dependencies]
3. Ensure [successful completion/integration]
**Orchestration Process:**
1. **Plan**: Understand full workflow and dependencies
2. **Prepare**: Set up prerequisites
3. **Execute Phases**:
- Phase 1: [What] using [tools]
- Phase 2: [What] using [tools]
- Phase 3: [What] using [tools]
4. **Monitor**: Track progress and handle failures
5. **Verify**: Confirm successful completion
6. **Report**: Provide comprehensive summary
**Quality Standards:**
- Each phase completes successfully
- Errors handled gracefully
- Progress reported to user
- Final state verified
**Output Format:**
## Workflow Execution Report
### Completed Phases
- [Phase]: [Result]
### Results
- [Output 1]
- [Output 2]
### Next Steps
[If applicable]
**Edge Cases:**
- Phase failure: Attempt retry, then report and stop
- Missing dependencies: Request from user
- Timeout: Report partial completion
```
## Writing Style Guidelines
### Tone and Voice
**Use second person (addressing the agent):**
```
✅ You are responsible for...
✅ You will analyze...
✅ Your process should...
❌ The agent is responsible for...
❌ This agent will analyze...
❌ I will analyze...
```
### Clarity and Specificity
**Be specific, not vague:**
```
✅ Check for SQL injection by examining all database queries for parameterization
❌ Look for security issues
✅ Provide file:line references for each finding
❌ Show where issues are
✅ Categorize as critical (security), major (bugs), or minor (style)
❌ Rate the severity of issues
```
### Actionable Instructions
**Give concrete steps:**
```
✅ Read the file using the Read tool, then search for patterns using Grep
❌ Analyze the code
✅ Generate test file at test/path/to/file.test.ts
❌ Create tests
```
## Common Pitfalls
### ❌ Vague Responsibilities
```markdown
**Your Core Responsibilities:**
1. Help the user with their code
2. Provide assistance
3. Be helpful
```
**Why bad:** Not specific enough to guide behavior.
### ✅ Specific Responsibilities
```markdown
**Your Core Responsibilities:**
1. Analyze TypeScript code for type safety issues
2. Identify missing type annotations and improper 'any' usage
3. Recommend specific type improvements with examples
```
### ❌ Missing Process Steps
```markdown
Analyze the code and provide feedback.
```
**Why bad:** Agent doesn't know HOW to analyze.
### ✅ Clear Process
```markdown
**Analysis Process:**
1. Read code files using Read tool
2. Scan for type annotations on all functions
3. Check for 'any' type usage
4. Verify generic type parameters
5. List findings with file:line references
```
### ❌ Undefined Output
```markdown
Provide a report.
```
**Why bad:** Agent doesn't know what format to use.
### ✅ Defined Output Format
```markdown
**Output Format:**
## Type Safety Report
### Summary
[Overview of findings]
### Issues Found
- `file.ts:42` - Missing return type on `processData`
- `utils.ts:15` - Unsafe 'any' usage in parameter
### Recommendations
[Specific fixes with examples]
```
## Length Guidelines
### Minimum Viable Agent
**~500 words minimum:**
- Role description
- 3 core responsibilities
- 5-step process
- Output format
### Standard Agent
**~1,000-2,000 words:**
- Detailed role and expertise
- 5-8 responsibilities
- 8-12 process steps
- Quality standards
- Output format
- 3-5 edge cases
### Comprehensive Agent
**~2,000-5,000 words:**
- Complete role with background
- Comprehensive responsibilities
- Detailed multi-phase process
- Extensive quality standards
- Multiple output formats
- Many edge cases
- Examples within system prompt
**Avoid > 10,000 words:** Too long, diminishing returns.
## Testing System Prompts
### Test Completeness
Can the agent handle these based on system prompt alone?
- [ ] Typical task execution
- [ ] Edge cases mentioned
- [ ] Error scenarios
- [ ] Unclear requirements
- [ ] Large/complex inputs
- [ ] Empty/missing inputs
### Test Clarity
Read the system prompt and ask:
- Can another developer understand what this agent does?
- Are process steps clear and actionable?
- Is output format unambiguous?
- Are quality standards measurable?
### Iterate Based on Results
After testing agent:
1. Identify where it struggled
2. Add missing guidance to system prompt
3. Clarify ambiguous instructions
4. Add process steps for edge cases
5. Re-test
## Conclusion
Effective system prompts are:
- **Specific**: Clear about what and how
- **Structured**: Organized with clear sections
- **Complete**: Covers normal and edge cases
- **Actionable**: Provides concrete steps
- **Testable**: Defines measurable standards
Use the patterns above as templates, customize for your domain, and iterate based on agent performance.

View File

@@ -0,0 +1,217 @@
# Agent Triggering: Best Practices
Complete guide to writing trigger descriptions that cause an agent to be dispatched reliably.
## Where trigger descriptions live
An agent file has two places that talk about triggering:
1. **`description:` field in YAML frontmatter.** Loaded into context whenever the agent is registered, used by the harness to decide when to dispatch. Keep it flat prose.
2. **A "When to invoke" section in the agent body.** Loaded only when the agent is actually invoked. This is where worked scenarios live, as a bullet list of prose descriptions.
## Format
### `description:` field
```
description: Use this agent when [conditions]. Typical triggers include [scenario 1 phrased as a prose noun phrase], [scenario 2], and [scenario 3]. See "When to invoke" in the agent body for worked scenarios.
```
Rules:
- Single line of flat prose within the YAML scalar.
- Name 2-4 trigger scenarios as noun phrases.
- End with the pointer to the body's "When to invoke" section.
### "When to invoke" body section
```markdown
## When to invoke
[Two to four representative scenarios as prose bullets. Each describes the situation
in third person and what the agent should do.]
- **[Short scenario name].** [What the situation looks like — what just happened or what
the user is asking for — and what the agent should do in response.]
- **[Short scenario name].** [Same.]
```
## Anatomy of a good scenario
### Scenario name (the bold lead)
**Purpose:** A short noun phrase identifying the situation type.
**Good names:**
- *User-requested review after a feature lands.*
- *Proactive review of newly-written code.*
- *Pre-PR sanity check.*
- *PR updated with new logic.*
**Bad names:**
- *Normal usage.* (not specific)
- *User needs help.* (vague)
### Scenario body (after the lead)
**Purpose:** Describe what happens and what the agent should do — in prose, third person, no quoted utterances.
**Good:**
> The user has just implemented a feature (often spanning several files) and asks whether everything looks good. Run a review of the recent diff and report findings.
**Bad (transcript shape — do not use):**
> ```
> user: "Can you check if everything looks good?"
> assistant: "I'll use the reviewer agent..."
> ```
The bad version mixes a turn-marker shape into the agent file. Keep scenarios as situation descriptions in prose.
## Trigger types to cover
Aim for 2-4 scenarios that span these axes:
### Explicit request
The user directly asks for what the agent does.
- *User-requested security check.* The user explicitly asks for a security review of recent code.
### Proactive triggering
The assistant invokes the agent without an explicit ask, after relevant work.
- *Proactive review after writing database code.* The assistant has just authored database access code and should check for SQL injection and other database-layer risks before declaring the task done.
### Implicit request
The user implies need without naming the agent.
- *Code-clarity complaint.* The user describes existing code as confusing or hard to follow. Treat as a request to refactor for readability.
### Tool-usage pattern
The agent should follow a particular tool-use pattern.
- *Post-test-edit verification.* The assistant has just made multiple edits to test files. Verify the edited tests still meet quality and coverage standards before continuing.
## Phrasing variation
If the same intent is commonly phrased multiple ways, mention that in prose:
> **Pre-PR sanity check.** The user signals (in any phrasing — "ready to open a PR", "I think we're done here", "let's ship this") that they're about to open a pull request.
Don't write three near-duplicate scenarios that differ only in the literal phrase — collapse them into one prose scenario that names the variation.
## How many scenarios?
- **Minimum: 2.** Usually one explicit + one proactive.
- **Recommended: 3-4.** Explicit, proactive, and one implicit or edge case.
- **Maximum: 5.** More than that bloats the body without adding routing signal.
## Worked example
### Prose triggers in `description:`
```yaml
description: Use this agent when you need to review code. Typical triggers include user-requested review after a feature lands, proactive review of freshly-written code, and a pre-PR sanity check. See "When to invoke" in the agent body for worked scenarios.
```
### Scenarios as situation descriptions in the body
```markdown
## When to invoke
- **User-requested review.** The user asks for a review of recent changes (any phrasing). Run a review of the unstaged diff.
```
### Trigger condition only — output format goes elsewhere
```markdown
- **Review.** The user asks for a review. Run the review and report findings as specified in the Output Format section.
```
## Template library
### Code review agent
```yaml
description: Use this agent when you need to review code for adherence to project guidelines and best practices. Typical triggers include the user asking for a review of a feature they just implemented, proactive review of newly-written code before declaring a task done, and a pre-PR sanity check. See "When to invoke" in the agent body.
```
```markdown
## When to invoke
- **User-requested review after a feature lands.** The user has implemented a feature and asks whether the result looks good. Review the recent diff and report findings.
- **Proactive review of newly-written code.** The assistant has just authored new code in response to a user request. Run a self-review before declaring the task done.
- **Pre-PR sanity check.** The user signals readiness to open a pull request. Review the full diff first.
```
### Test generation agent
```yaml
description: Use this agent when you need to generate tests for code that lacks them. Typical triggers include the user explicitly asking for tests for a function or module, and the assistant proactively generating tests after writing new code that has no test coverage. See "When to invoke" in the agent body.
```
```markdown
## When to invoke
- **Explicit test request.** The user asks for tests covering a specific function, module, or feature. Generate a comprehensive test suite.
- **Proactive coverage after new code.** The assistant has just written new code with no accompanying tests. Generate tests before declaring the task done.
```
### Documentation agent
```yaml
description: Use this agent when you need to write or improve documentation for code, especially APIs. Typical triggers include the user asking for docs on a specific function or endpoint, and proactive documentation generation after the assistant adds new API surface. See "When to invoke" in the agent body.
```
```markdown
## When to invoke
- **Explicit doc request.** The user asks for documentation for a specific surface (function, endpoint, module).
- **Proactive docs for new API surface.** The assistant has just added new API endpoints or public functions without docstrings.
```
### Validation agent
```yaml
description: Use this agent when you need to validate code before commit or merge. Typical triggers include the user signaling readiness to commit, and an explicit validation request. See "When to invoke" in the agent body.
```
```markdown
## When to invoke
- **Pre-commit validation.** The user signals readiness to commit. Run validation first and surface any issues.
- **Explicit validation request.** The user asks for the code to be validated.
```
## Debugging triggering issues
### Agent not triggering
Check:
1. The `description:` prose names the right trigger scenarios.
2. The scenarios in the body cover the actual phrasings the user uses.
3. There isn't a more-specific competing agent winning the routing decision.
Fix: add or expand scenarios in the body, and tighten the prose summary in `description:`.
### Agent triggers too often
Check:
1. The trigger scenarios are too generic or overlap with other agents.
2. The `description:` doesn't say when NOT to use the agent.
Fix: narrow the scenarios; add a "Do not invoke when..." line to `description:` if needed.
### Agent triggers in the wrong scenarios
Check:
1. Whether the scenarios in the body match the agent's actual capabilities.
Fix: rewrite scenarios to match what the agent actually does.
## Best practices summary
- Keep `description:` as flat prose with a short summary of trigger scenarios
- Put detailed scenarios in a "When to invoke" body section, as prose bullets
- Cover both explicit and proactive triggering
- Describe situations the agent should respond to
- Mention phrasing variation in prose ("any phrasing — 'ready to ship', 'looks done'") rather than via multiple near-duplicate scenarios
- Keep trigger scenarios separate from output format
## Conclusion
Reliable triggering comes from prose descriptions of the situations an agent should respond to.

Some files were not shown because too many files have changed in this diff Show More