Compare commits

..

12 Commits

Author SHA1 Message Date
59cf082e06 feat: normalized external-agent status (#10 scoped) (v2.7.6)
Scoped half of boocode_code_review_v2 §1 #10 — publish the agent status
BooCoder already observes (the config-injection notify-hook is the documented
follow-on, clean-room from superset ELv2).

- agent_status_updated WS frame (working|blocked|idle|error), server+web parity.
- Published from the dispatcher's turn boundaries (warm-acp/opencode/sdk/pty:
  working at start, idle/error at end) + the permission flow (blocked/working).
  Best-effort, never breaks a turn.
- Clean-room normalizeAgentEvent helper (superset's vendor-event -> Start/blocked
  /Stop collapse, event names as facts) + 25 tests — reused by the follow-on.
- AgentComposerBar status dot (distinct from the WS-liveness dot), tracked per
  (chat,agent) by a useAgentStatus map in CoderPane.

Built by 2 parallel agents vs a pinned frame contract. Server 545 + coder 294
tests passing (25 new); web tsc + builds clean; ws-frames parity green. Clears
the actionable review backlog (#1/#3/#4/#6-#12). Builds on v2.7.5.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-01 14:04:04 +00:00
6fc3175730 Merge claude-sdk-backend: v2.7.5 Claude SDK backend + clean-room PostgresSessionStore 2026-06-01 13:38:05 +00:00
f3a0197d6a feat: Claude Agent SDK backend + clean-room PostgresSessionStore (v2.7.5)
Lands the lean-SDK direction (boocode_code_review_v2 §1 #9) behind a flag.
Adds @anthropic-ai/claude-agent-sdk@0.3.159 (Commercial Terms, runtime dep).

- PostgresSessionStore: clean-room impl of the SDK's real SessionStore type
  over a new claude_session_entries table. Typechecks against the SDK type;
  8 DB-integration tests.
- ClaudeSdkBackend (implements AgentBackend): one warm query() per (chat,claude)
  in streaming-input mode via a pushable async-iterable pump, sessionStore +
  resume continuity, pure mapSdkMessage->AgentEvent, session_id from init,
  usage/cost onto agent_sessions (backend CHECK gains 'claude_sdk').
- Routing env-gated by CLAUDE_SDK_BACKEND (default off) -> PTY path UNCHANGED.
- Built against real SDK 0.3.159 types (install paid off: partial=stream_event
  needing includePartialMessages, MessageParam, result error arm).
- Fix latent test-infra deadlock: serialize DB suites (fileParallelism:false).

Coder 269 passing default / 290 with DB; tsc clean vs SDK types; builds clean.
LIVE pump + resume + actual claude turn need a host smoke (CLAUDE_SDK_BACKEND=1
+ claude binary + auth). zod peer-dep wants ^4 (workspace 3.25). Builds on v2.7.4.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-01 13:37:57 +00:00
7e0ecde83d Merge mistake-tracker-ledger: v2.7.4 heterogeneous-failure recovery + file-read ledger 2026-06-01 13:05:19 +00:00
bcc89d8adc feat: MistakeTracker + file-provenance ledger (v2.7.4)
Two native-inference hardening features from boocode_code_review_v2 §1 #12.

MistakeTracker: new pure mistake-tracker.ts tracks consecutive heterogeneous
tool failures (kinds surfaced per tool from tool-phase.ts). On 3 in a row the
turn loop soft-nudges (model-facing recovery guidance + mistake_recovery
sentinel + reset), then escalates to stopping the turn (cap-hit-style, Continue
affordance) on a re-trip. Complements doom-loop (identical repeats) + cap-hit.

File-provenance ledger: compaction.ts derives a deterministic ## Files Read list
from the head messages' read-tool calls and injects it into the rolling-summary
prompt so provenance survives compaction (no new table; read-only).

mistake_recovery sentinel: MessageMetadata arm (server + web) + MessageBubble
render branch. Built by 2 parallel agents. Server 545 tests passing (23 new);
build + web tsc clean. Native-inference only. Builds on v2.7.3.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-01 13:05:03 +00:00
f53d6a8afd Merge sampling-knobs-streamjson: v2.7.3 sampling knobs + live PTY stream-json + token UI 2026-06-01 12:47:31 +00:00
a584dd16b0 feat: sampling knobs + live PTY stream-json + token UI (v2.7.3)
Three small wins from boocode_code_review_v2 §1 #11/#7/#8.

#11 sampling knobs: top_n_sigma + dry_* family as first-class Agent fields,
threaded into the request body via providerOptions.openaiCompatible. Fixes a
latent bug — top_k (rejected by the AI-SDK provider) and min_p (never passed to
streamText) were dead on the wire; both now route through the same channel.
--reasoning-budget documented in data/AGENTS.md.

#7 live PTY stream-json: new stream-json-parser.ts line-buffers qwen/claude
NDJSON and emits text/reasoning/tool frames live + persists, with a fallback to
the old opaque slice. claude gets --output-format stream-json --verbose.

#8 token UI: agent_sessions input/output_tokens/cost now flow through the route
+ type and render beside the AgentComposerBar session chip.

Built by 3 parallel agents. Server 523 + coder 245 tests passing; builds + web
tsc clean. Builds on v2.7.2. openspec sampling-streamjson-tokens.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-01 12:47:17 +00:00
5651f56039 Merge checkpoint-idor-fix: v2.7.2 close 2 checkpoint IDOR holes 2026-06-01 12:16:08 +00:00
9c7d80e2d8 fix(security): scope checkpoint routes to session — close 2 IDORs (v2.7.2)
Flagged by the automated push security review on v2.7.1.

- GET /checkpoints?chat_id= : the chat_id branch filtered by chat_id alone
  (any session's chat_id read its checkpoints). Now joins chats and gates on
  chats.session_id.
- restoreCheckpoint scope guard was fail-open: `cp.session_id && cp.session_id
  !== sessionId` fell through on a null denormalized session_id, allowing a
  cross-session restore (worktree reset + transcript trim). Now resolves the
  owning session via the checkpoint's chat and denies on missing/mismatch.
- Adds a DB-integration regression for the null-session_id cross-session case.

Both scope authoritatively through chats.session_id (checkpoints.session_id is
a nullable hint). Coder suite 234 passing; 7/7 checkpoint tests (incl. the
regression) against live postgres+git; typecheck clean. Hotfix on v2.7.1.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-01 12:15:54 +00:00
a41a02a62b Merge fuzzy-checkpoints: v2.7.1 write/edit robustness (fuzzy applier + worktree checkpoints) 2026-06-01 12:02:06 +00:00
59f07e8cb8 feat: write/edit robustness — fuzzy patch applier + worktree checkpoints (v2.7.1)
#3 Fuzzy patch applier: new pure fuzzy-match.ts (locateMatch, exact→trim→
unicode-canon→Levenshtein≥0.66, refuse-on-ambiguous) wired into pending_changes
applyOne/rewindOne so local-model whitespace/unicode drift in old_string no
longer loses the edit.

#4 Worktree checkpoint + conversation-trim: checkpoints table + checkpoints.ts
(shadow-commit of tracked+untracked into refs/boocode/checkpoints, hooked into
the 3 external-agent dispatcher paths) + POST restore route (reset --hard +
clean -fd -> transcript trim -> backend-session reset) + "Restore to here" UI.

Built by 3 parallel agents; DB-integration testing caught a created_at
self-deletion bug. Coder suite 234 passing; server+coder build + web tsc clean.
Builds on v2.7.0-mit. openspec write-edit-robustness.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-01 12:01:57 +00:00
1108d07fb2 Merge relicense-agpl-to-mit: v2.7.0 AGPL-3.0 → MIT relicense 2026-06-01 08:16:25 +00:00
61 changed files with 5637 additions and 61 deletions

View File

@@ -2,6 +2,30 @@
All notable changes per release tag. Most recent on top, ordered by tag creation date (which matches the git history). Tag names follow `vMAJOR.MINOR.PATCH-slug` — the slug describes what shipped, so the tag name alone is enough to recall the batch. All notable changes per release tag. Most recent on top, ordered by tag creation date (which matches the git history). Tag names follow `vMAJOR.MINOR.PATCH-slug` — the slug describes what shipped, so the tag name alone is enough to recall the batch.
## v2.7.6-agent-status-normalize — 2026-06-01
The scoped half of `boocode_code_review_v2.md` §1 #10 — normalized external-agent status, surfaced from BooCoder's own dispatch observation (the heavier config-injection notify-hook, clean-room from superset's ELv2 `agent-setup`, is documented as the follow-on). The review's premise ("PTY agents have no status") had partly aged out — warm-ACP/opencode/SDK already carry working/done — so the real gap was that BooCoder never *published* a normalized per-`(chat,agent)` status (blocked-on-permission was invisible; crash/idle weren't pushed). Adds an `agent_status_updated` WS frame (`working|blocked|idle|error`, server+web parity) published from the dispatcher's turn boundaries across all four external paths (warm-acp/opencode/sdk/pty — `working` at start, `idle`/`error` at end) and the permission flow (`blocked` on request, `working` on resolve), best-effort so it never breaks a turn. A clean-room `normalizeAgentEvent` helper (superset's ~30-vendor-event → Start/blocked/Stop collapse, reimplemented with the event names as facts) ships now with 25 tests so the deferred notify-hook injection reuses it verbatim. The `AgentComposerBar` gains a normalized status dot (working=spinner, blocked=amber, idle=gray, error=red) distinct from the WS-liveness dot, fed by a `useAgentStatus` map `CoderPane` tracks per `(chat,agent)`. Built by two parallel agents (data plane + view plane) against a pinned frame contract; server 545 + coder 294 tests passing (25 new), web tsc + builds clean, ws-frames parity green. Clears the actionable review backlog (#1/#3/#4/#6#12). Builds on `v2.7.5-claude-sdk-sessionstore`; openspec `agent-status-normalize`.
## v2.7.5-claude-sdk-sessionstore — 2026-06-01
Lands the Claude Agent SDK direction (`boocode_code_review_v2.md` §1 #9, §6.2 "lean SDK") behind a flag. Adds `@anthropic-ai/claude-agent-sdk@0.3.159` (Commercial Terms — runtime dep, code reference-only) and builds a warm, resumable claude backend to supersede one-shot PTY dispatch — env-gated (`CLAUDE_SDK_BACKEND`, default off) so production claude stays on the unchanged PTY path until a host smoke. **Clean-room `PostgresSessionStore`** implements the SDK's real `SessionStore` type (`append`/`load`/`listSessions`/`delete`/`listSubkeys`) over a new `claude_session_entries` table — typechecked against the installed SDK type, 8 DB-integration tests. **`ClaudeSdkBackend`** (`implements AgentBackend`, mirroring warm-acp/opencode-server) drives one persistent `query()` per `(chat,'claude')` in streaming-input mode via a pushable async-iterable pump, with `sessionStore` + `resume` for cross-turn/cross-restart continuity, a pure `mapSdkMessage``AgentEvent` mapper, `session_id` captured from the `init` message, and `result.usage`/`total_cost_usd` accumulated onto `agent_sessions` (backend CHECK gains `'claude_sdk'`). Built against the REAL SDK 0.3.159 types after installing it — surfacing shapes a blind build would have missed (`SDKPartialAssistantMessage` is `type:'stream_event'` needing `includePartialMessages`; `SDKUserMessage.message` is `MessageParam`; the `SDKResultMessage` error arm). Also fixes a latent test-infra deadlock — three DB-integration suites applying the full schema in parallel under `DATABASE_URL` deadlocked, now serialized via `fileParallelism:false`. ~32 new tests (8 store + 10 mapper + 8 pushable + 6 routing); coder suite 269 passing default / 290 with DB; tsc clean against the SDK types; builds clean. **The live streaming pump + resume + an actual claude turn need a host smoke (`CLAUDE_SDK_BACKEND=1` + claude binary + ANTHROPIC auth) — cannot run from the dev container.** The zod peer-dep wants `^4` (workspace `3.25`) — watch at runtime. Builds on `v2.7.4-mistake-tracker-ledger`; openspec `claude-sdk-sessionstore`.
## v2.7.4-mistake-tracker-ledger — 2026-06-01
Two native-inference hardening features from `boocode_code_review_v2.md` §1 #12 (cline, algorithm-reimplemented). **MistakeTracker:** complements the doom-loop guard (identical repeats) and cap-hit (budget) by catching a run of consecutive tool *failures*. A new pure `mistake-tracker.ts` tracks heterogeneous failure kinds (`zod_reject`/`tool_not_found`/`exec_error`/`api_error`/`permission_denied`, surfaced per tool from `tool-phase.ts`); after 3 consecutive failures the `turn.ts` loop does a **soft nudge** — injects model-facing recovery guidance into the next step + drops a `mistake_recovery` UI sentinel + resets — then **escalates** to stopping the turn (cap-hit-style, with a Continue affordance) if it re-trips without an intervening success, so heterogeneous failures can't burn the whole step budget. **File-provenance ledger:** `compaction.ts` now derives a deterministic, sorted `## Files Read` list from the head messages' read-tool calls (`view_file`/`grep`/`find_files`/`list_dir`) and injects it into the rolling-summary prompt so file provenance survives compaction (no new table; prompt-driven merge, read-only since BooChat has no write tools). The `mistake_recovery` sentinel adds an arm to `MessageMetadata` in both server + web type copies plus a `MessageBubble` render branch. Built by two parallel agents (backend + frontend sentinel) over disjoint apps; server 545 tests passing (23 new: 12 mistake-tracker + 11 compaction), build + web tsc clean. Native-inference only (external agents run their own loops). Builds on `v2.7.3-sampling-streamjson-tokens`; openspec `mistake-tracker-file-ledger`.
## v2.7.3-sampling-streamjson-tokens — 2026-06-01
Three small BooCode wins from `boocode_code_review_v2.md` §1 #11/#7/#8. **Sampling knobs:** per-agent `top_n_sigma` + the `dry_*` repetition family (`dry_multiplier`/`dry_base`/`dry_allowed_length`/`dry_penalty_last_n`) are now first-class Agent frontmatter fields, parsed in `agents.ts` and threaded into the llama-swap chat-completion body via `providerOptions.openaiCompatible` (the `@ai-sdk/openai-compatible` extra-body channel). This surfaced and fixed a **latent bug**: `top_k` (rejected by the AI-SDK provider as unsupported) and `min_p` (never passed to `streamText` at all) had been dead on the wire — no agent's `top_k`/`min_p` ever affected sampling; both now route through the same channel, so agents that set them will start using them. `--reasoning-budget` is documented in `data/AGENTS.md` (already works via `llama_extra_args`, permitted by the deny-list validator). **Live PTY stream-json:** qwen/claude PTY dispatch sliced stdout opaque; a new `stream-json-parser.ts` line-buffers the Claude-Code-compatible NDJSON and emits text/reasoning/tool frames live as they arrive (mirroring the ACP/opencode paths) + persists the structured parts, with a clean fallback to the old opaque slice when output isn't NDJSON (claude now runs `--output-format stream-json --verbose`). **Token UI:** the per-`(chat,agent)` `agent_sessions.input_tokens`/`output_tokens`/`cost` columns (accumulated since `v2.6.8` but dropped by the read route + wire type) now flow through and render condensed beside the AgentComposerBar session chip. Built by three parallel agents over disjoint subsystems; server 523 + coder 245 tests passing (incl. 11 new stream-json-parser + new agent-parse tests), all builds + web tsc clean. Builds on `v2.7.2-checkpoint-idor`; openspec `sampling-streamjson-tokens`. The qwen-vs-claude `usage` field names in #7 are best-guess pending a live smoke.
## v2.7.2-checkpoint-idor — 2026-06-01
Closes two IDOR authorization holes in the `v2.7.1-write-edit-robustness` checkpoint routes, flagged by the automated push security review. The `GET /api/sessions/:id/checkpoints?chat_id=` list route scoped its `chat_id` branch by `chat_id` alone — any session's `chat_id` would read its checkpoints; it now joins through `chats` and gates on `chats.session_id` (authoritative; `checkpoints.session_id` is a nullable denormalized hint). The `restoreCheckpoint` scope guard was fail-open — `cp.session_id && cp.session_id !== sessionId` fell through whenever the checkpoint's denormalized `session_id` was null, allowing a cross-session restore (worktree reset + transcript trim) — it now resolves the owning session via the checkpoint's chat and denies on any missing-or-mismatched row. A DB-integration regression covers the exact null-`session_id` cross-session case. Real-world blast radius is small (BooCoder is single-user behind Authelia on loopback), but both are genuine authorization bugs. Coder suite 234 passing (7/7 checkpoint tests incl. the regression against live postgres+git), typecheck clean. Hotfix on `v2.7.1-write-edit-robustness`.
## v2.7.1-write-edit-robustness — 2026-06-01
Two BooCoder hardening features for local quantized models, algorithm-reimplemented (not vendored) from the cline findings in `boocode_code_review_v2.md` §1 #3/#4. **Fuzzy patch applier:** `edit_file`'s apply path was exact-`.includes`-or-throw + first-occurrence `.replace` (`pending_changes.ts`), so a qwen3.6 whitespace/indentation/unicode drift in `old_string` lost the edit; a new pure `fuzzy-match.ts` (`locateMatch`) now runs an exact → per-line-trim → unicode-canon (curly quotes/dashes/nbsp) → Levenshtein-≥0.66 ladder and returns the real file span, refusing multi-exact matches as ambiguous rather than silently editing the first. `applyOne`/`rewindOne` both use it. **Worktree checkpoints + conversation-trim:** `rewind` only reversed BooCode's own `pending_changes`, blind to what external agents (opencode/goose/qwen/claude) write directly into the session worktree — so a new `checkpoints` table + `checkpoints.ts` shadow-commit (tracked **and** untracked, captured via a temp-index `read-tree`/`add`/`write-tree`/`commit-tree` into a GC-safe `refs/boocode/checkpoints/<id>`) snapshots the worktree before each external-agent turn (hooked into all three dispatcher paths), anchored to the turn's assistant message. A new `POST /api/sessions/:id/checkpoints/:cid/restore` resets the worktree (`reset --hard` + `clean -fd`), trims the transcript past that message, and resets the `(chat,agent)` backend session so files, transcript, and agent context land consistent at the restore point; a per-message "Restore to here" affordance in `CoderMessageList` drives it. Built by three parallel agents over disjoint files; DB-integration testing caught a microsecond-`created_at` self-deletion bug in the later-checkpoint cleanup. Full coder suite 234 passing (incl. 17 fuzzy-match + 6 checkpoint tests), server+coder build + web tsc clean. Builds on `v2.7.0-mit`; openspec `write-edit-robustness`. Live host smoke (dispatcher hook + restore UI end-to-end) still to run.
## v2.7.0-mit — 2026-06-01 ## v2.7.0-mit — 2026-06-01
Relicenses BooCode from AGPL-3.0 back to MIT by clearing the three Unsloth-Studio-derived files the `v2.4.0`/`v2.4.1` lifts pulled in — the root `LICENSE` and all five `package.json` had been `AGPL-3.0-only`, making the network-served work AGPL §13-encumbered. The enabling finding decoupled the relicense from the long-planned native-llama-server-parsing retirement: `tool-call-parser.ts`'s Unsloth-ported algorithm (`parseToolCallsFromText`/`scanBalancedBraces` + unused nudge constants) was **dead code** with no production import, so it was simply deleted while the load-bearing `extractToolCallBlocks`/`stripToolMarkup` (BooCode-authored streaming helpers) were kept byte-identical — no behavior change to the live tool-call path. `html-to-md.ts` was swapped to the MIT `node-html-markdown` library (`parse5` dropped; the only behavior delta is column-aligned tables, GFM hard-break `<br>`, and `<ol start>` renumbering, all feeding the LLM via `web_fetch`), and `llama-args-validator.ts` was clean-room rewritten with the managed-flag denylist re-derived from the public llama-server flag list (facts, not copyrightable). The license flip set `LICENSE` to MIT (`Copyright (c) 2026 indifferentketchup`), the five `package.json` to `MIT`, removed every AGPL SPDX header, added a README License section, and added a `license-mit` guard test that fails if AGPL provenance returns. Built by three parallel agents over the disjoint files; full server suite 519 passing (incl. 9 new guard tests), server build + coder typecheck clean. Resolves `boocode_code_review_v2.md` §1 #1 / §5k and the roadmap's `License-debt` batch (openspec `license-debt-mit`); supersedes that batch's original staged plan, which had entangled the flip with a live qwen3.6 validation window. Relicenses BooCode from AGPL-3.0 back to MIT by clearing the three Unsloth-Studio-derived files the `v2.4.0`/`v2.4.1` lifts pulled in — the root `LICENSE` and all five `package.json` had been `AGPL-3.0-only`, making the network-served work AGPL §13-encumbered. The enabling finding decoupled the relicense from the long-planned native-llama-server-parsing retirement: `tool-call-parser.ts`'s Unsloth-ported algorithm (`parseToolCallsFromText`/`scanBalancedBraces` + unused nudge constants) was **dead code** with no production import, so it was simply deleted while the load-bearing `extractToolCallBlocks`/`stripToolMarkup` (BooCode-authored streaming helpers) were kept byte-identical — no behavior change to the live tool-call path. `html-to-md.ts` was swapped to the MIT `node-html-markdown` library (`parse5` dropped; the only behavior delta is column-aligned tables, GFM hard-break `<br>`, and `<ol start>` renumbering, all feeding the LLM via `web_fetch`), and `llama-args-validator.ts` was clean-room rewritten with the managed-flag denylist re-derived from the public llama-server flag list (facts, not copyrightable). The license flip set `LICENSE` to MIT (`Copyright (c) 2026 indifferentketchup`), the five `package.json` to `MIT`, removed every AGPL SPDX header, added a README License section, and added a `license-mit` guard test that fails if AGPL provenance returns. Built by three parallel agents over the disjoint files; full server suite 519 passing (incl. 9 new guard tests), server build + coder typecheck clean. Resolves `boocode_code_review_v2.md` §1 #1 / §5k and the roadmap's `License-debt` batch (openspec `license-debt-mit`); supersedes that batch's original staged plan, which had entangled the flip with a live qwen3.6 validation window.

View File

@@ -14,11 +14,12 @@
}, },
"dependencies": { "dependencies": {
"@agentclientprotocol/sdk": "^0.22.1", "@agentclientprotocol/sdk": "^0.22.1",
"@anthropic-ai/claude-agent-sdk": "^0.3.159",
"@boocode/server": "workspace:*", "@boocode/server": "workspace:*",
"@fastify/static": "^7.0.4", "@fastify/static": "^7.0.4",
"@opencode-ai/sdk": "~1.15.0",
"@fastify/websocket": "^10.0.1", "@fastify/websocket": "^10.0.1",
"@modelcontextprotocol/sdk": "^1.29.0", "@modelcontextprotocol/sdk": "^1.29.0",
"@opencode-ai/sdk": "~1.15.0",
"fastify": "^4.28.1", "fastify": "^4.28.1",
"postgres": "^3.4.4", "postgres": "^3.4.4",
"ws": "^8.18.0", "ws": "^8.18.0",

View File

@@ -25,6 +25,7 @@ import { setInferenceContext, clearInferenceContext } from './services/tools/inf
import { registerMessageRoutes } from './routes/messages.js'; import { registerMessageRoutes } from './routes/messages.js';
import { registerSkillRoutes } from './routes/skills.js'; import { registerSkillRoutes } from './routes/skills.js';
import { registerPendingRoutes } from './routes/pending.js'; import { registerPendingRoutes } from './routes/pending.js';
import { registerCheckpointRoutes } from './routes/checkpoints.js';
import { registerAgentSessionRoutes } from './routes/agent-sessions.js'; import { registerAgentSessionRoutes } from './routes/agent-sessions.js';
import { registerTaskRoutes } from './routes/tasks.js'; import { registerTaskRoutes } from './routes/tasks.js';
import { registerInboxRoutes } from './routes/inbox.js'; import { registerInboxRoutes } from './routes/inbox.js';
@@ -41,6 +42,7 @@ import { createOrphanWorktreeReaper } from './services/orphan-worktree-reaper.js
import { probeAgents } from './services/agent-probe.js'; import { probeAgents } from './services/agent-probe.js';
import { getProviderSnapshot, persistProbedModels } from './services/provider-snapshot.js'; import { getProviderSnapshot, persistProbedModels } from './services/provider-snapshot.js';
import { setPermissionHooks } from './services/permission-waiter.js'; import { setPermissionHooks } from './services/permission-waiter.js';
import { publishAgentStatus } from './services/agent-status-publish.js';
import { homedir } from 'node:os'; import { homedir } from 'node:os';
async function main() { async function main() {
@@ -81,6 +83,21 @@ async function main() {
// Broker: in-memory pub/sub for session + user channel streaming. // Broker: in-memory pub/sub for session + user channel streaming.
const broker = createBroker(app.log); const broker = createBroker(app.log);
// agent-status-normalize (#10): the permission hooks carry only taskId +
// sessionId, but the tasks row holds the (chat_id, agent) pair the status frame
// is keyed on. Resolve it best-effort so a blocked/working status accompanies
// every permission_requested/permission_resolved. Returns null when the task
// lacks a chat_id or agent (sessionless creators) — we simply skip the status.
const resolveChatAgent = async (
taskId: string,
): Promise<{ chatId: string; agent: string } | null> => {
const [row] = await sql<{ chat_id: string | null; agent: string | null }[]>`
SELECT chat_id, agent FROM tasks WHERE id = ${taskId}
`;
if (!row?.chat_id || !row.agent) return null;
return { chatId: row.chat_id, agent: row.agent };
};
setPermissionHooks({ setPermissionHooks({
onPrompt: async (prompt) => { onPrompt: async (prompt) => {
await sql` await sql`
@@ -95,6 +112,18 @@ async function main() {
...(prompt.input ? { input: prompt.input } : {}), ...(prompt.input ? { input: prompt.input } : {}),
options: prompt.options.map((o) => ({ option_id: o.optionId, label: o.label })), options: prompt.options.map((o) => ({ option_id: o.optionId, label: o.label })),
} as WsFrame); } as WsFrame);
// #10: agent is blocked on a human decision.
const ca = await resolveChatAgent(prompt.taskId).catch(() => null);
if (ca) {
publishAgentStatus(
broker.publishFrame,
prompt.sessionId,
ca.chatId,
ca.agent,
'blocked',
'permission_request',
);
}
}, },
onResolved: async (taskId, sessionId) => { onResolved: async (taskId, sessionId) => {
await sql` await sql`
@@ -105,6 +134,18 @@ async function main() {
task_id: taskId, task_id: taskId,
session_id: sessionId, session_id: sessionId,
} as WsFrame); } as WsFrame);
// #10: human responded — agent resumes work.
const ca = await resolveChatAgent(taskId).catch(() => null);
if (ca) {
publishAgentStatus(
broker.publishFrame,
sessionId,
ca.chatId,
ca.agent,
'working',
'permission_resolved',
);
}
}, },
}); });
@@ -214,6 +255,7 @@ async function main() {
registerMessageRoutes(app, sql, broker, inferenceApi); registerMessageRoutes(app, sql, broker, inferenceApi);
registerSkillRoutes(app, sql, broker, inferenceApi); registerSkillRoutes(app, sql, broker, inferenceApi);
registerPendingRoutes(app, sql); registerPendingRoutes(app, sql);
registerCheckpointRoutes(app, sql);
registerAgentSessionRoutes(app, sql); registerAgentSessionRoutes(app, sql);
registerTaskRoutes(app, sql, inferenceApi); registerTaskRoutes(app, sql, inferenceApi);
registerInboxRoutes(app, sql); registerInboxRoutes(app, sql);

View File

@@ -16,6 +16,11 @@ export interface AgentSessionRow {
status: string; status: string;
has_session: boolean; has_session: boolean;
last_active_at: string | null; last_active_at: string | null;
// v2.6.8 per-(chat,agent) running token/cost totals (sampling-streamjson-tokens
// #8). BIGINT columns arrive as strings over the wire; the frontend coerces.
input_tokens: number;
output_tokens: number;
cost: number;
} }
export function registerAgentSessionRoutes(app: FastifyInstance, sql: Sql): void { export function registerAgentSessionRoutes(app: FastifyInstance, sql: Sql): void {
@@ -39,7 +44,10 @@ export function registerAgentSessionRoutes(app: FastifyInstance, sql: Sql): void
a.agent AS agent, a.agent AS agent,
a.status AS status, a.status AS status,
(a.agent_session_id IS NOT NULL) AS has_session, (a.agent_session_id IS NOT NULL) AS has_session,
a.last_active_at AS last_active_at a.last_active_at AS last_active_at,
a.input_tokens AS input_tokens,
a.output_tokens AS output_tokens,
a.cost AS cost
FROM agent_sessions a FROM agent_sessions a
JOIN chats c ON c.id = a.chat_id JOIN chats c ON c.id = a.chat_id
WHERE c.session_id = ${sessionId} WHERE c.session_id = ${sessionId}

View File

@@ -0,0 +1,73 @@
/**
* write-edit-robustness #4 — checkpoint restore + list routes (coder side).
*
* Proxied through the apps/server `/api/coder/*` blanket forwarder (no server-side
* change needed for new routes). Restore rewinds the session worktree to the
* checkpoint's shadow commit, trims the transcript from the anchor message forward,
* and resets the agent backend — see services/checkpoints.ts.
*/
import type { FastifyInstance } from 'fastify';
import type { Sql } from '../db.js';
import { restoreCheckpoint, CheckpointNotFoundError } from '../services/checkpoints.js';
export function registerCheckpointRoutes(app: FastifyInstance, sql: Sql): void {
// GET /api/sessions/:sessionId/checkpoints?chat_id= — list a chat's checkpoints
// so the frontend can mark which messages have a restore point. When chat_id is
// omitted, returns every checkpoint for the session's chats.
app.get<{ Params: { sessionId: string }; Querystring: { chat_id?: string } }>(
'/api/sessions/:sessionId/checkpoints',
async (req, reply) => {
const sessionId = req.params.sessionId;
const chatId = req.query.chat_id;
const session = await sql<{ id: string }[]>`SELECT id FROM sessions WHERE id = ${sessionId}`;
if (session.length === 0) {
reply.code(404);
return { error: 'session not found' };
}
// Scope authoritatively through chats.session_id (always set) — NOT the
// denormalized checkpoints.session_id (nullable). The chat_id branch must
// still be session-gated or it's an IDOR (any session's chat_id reads its
// checkpoints).
const rows = chatId
? await sql<{ id: string; chat_id: string; message_id: string | null; label: string | null; created_at: Date }[]>`
SELECT cp.id, cp.chat_id, cp.message_id, cp.label, cp.created_at
FROM checkpoints cp
JOIN chats c ON c.id = cp.chat_id
WHERE cp.chat_id = ${chatId} AND c.session_id = ${sessionId}
ORDER BY cp.created_at
`
: await sql<{ id: string; chat_id: string; message_id: string | null; label: string | null; created_at: Date }[]>`
SELECT cp.id, cp.chat_id, cp.message_id, cp.label, cp.created_at
FROM checkpoints cp
JOIN chats c ON c.id = cp.chat_id
WHERE c.session_id = ${sessionId}
ORDER BY cp.created_at
`;
return rows;
},
);
// POST /api/sessions/:sessionId/checkpoints/:checkpointId/restore — restore.
app.post<{ Params: { sessionId: string; checkpointId: string } }>(
'/api/sessions/:sessionId/checkpoints/:checkpointId/restore',
async (req, reply) => {
const { sessionId, checkpointId } = req.params;
try {
const result = await restoreCheckpoint(sql, checkpointId, {
sessionId,
log: app.log,
});
return result;
} catch (err) {
if (err instanceof CheckpointNotFoundError) {
reply.code(404);
return { error: err.message };
}
throw err;
}
},
);
}

View File

@@ -240,6 +240,55 @@ END $$;
-- v2.6: attribution for DiffPanel badges (Phase 1 UX reads this). -- v2.6: attribution for DiffPanel badges (Phase 1 UX reads this).
ALTER TABLE pending_changes ADD COLUMN IF NOT EXISTS agent TEXT; ALTER TABLE pending_changes ADD COLUMN IF NOT EXISTS agent TEXT;
-- write-edit-robustness #4: worktree checkpoints. A pre-turn shadow-commit of the
-- session worktree (tracked + untracked, captured without disturbing the real
-- index/working tree) stored in a private GC-safe ref refs/boocode/checkpoints/<id>.
-- Created best-effort before each external-agent turn (opencode / warm-ACP / one-shot
-- ACP+PTY); restore resets the worktree to commit_sha, trims the transcript from
-- message_id forward, and resets the backend session. chat_id CASCADEs from chats
-- (like agent_sessions); worktree_id SET NULL so a checkpoint outlives a reaped
-- worktree row. session_id / message_id are informational (no FK — message rows are
-- trimmed by a checkpoint restore and we must not block that on a dangling ref).
CREATE TABLE IF NOT EXISTS checkpoints (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
chat_id UUID NOT NULL REFERENCES chats(id) ON DELETE CASCADE,
session_id UUID,
worktree_id UUID REFERENCES worktrees(id) ON DELETE SET NULL,
message_id UUID, -- anchor: the assistant turn row this checkpoint precedes
commit_sha TEXT NOT NULL, -- shadow-commit capturing the pre-turn worktree tree
label TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT clock_timestamp()
);
CREATE INDEX IF NOT EXISTS checkpoints_chat_created_idx ON checkpoints(chat_id, created_at);
-- claude-sdk-sessionstore #9 (Part 1): append-only mirror of Claude Agent SDK
-- session transcripts. The SDK's SessionStore adapter writes one JSONL line per
-- entry; PostgresSessionStore (services/backends/claude-session-store.ts) inserts
-- one row per entry and replays them ORDER BY id on resume. The store is generic
-- per the SDK's SessionKey (project_key, session_id, subpath) — chat↔session
-- ownership lives in agent_sessions, not here. subpath '' is the main transcript
-- (the SDK's undefined subpath maps to '' in the column).
CREATE TABLE IF NOT EXISTS claude_session_entries (
id BIGSERIAL PRIMARY KEY,
project_key TEXT NOT NULL,
session_id TEXT NOT NULL,
subpath TEXT NOT NULL DEFAULT '', -- '' = main transcript (SDK's undefined subpath maps here)
entry JSONB NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT clock_timestamp()
);
CREATE INDEX IF NOT EXISTS claude_session_entries_key_idx ON claude_session_entries (project_key, session_id, subpath, id);
-- claude-sdk-sessionstore #9 (Part 2): the warm Claude-SDK backend persists its
-- agent_sessions rows with backend='claude_sdk'. Widen the named CHECK to accept
-- it. Idempotent: DROP the named constraint (the inline CREATE TABLE check above
-- carries this explicit name, so DROP IF EXISTS targets it) + re-ADD the widened
-- list. Re-runs/fresh deploys land on the same final constraint (the table-level
-- CREATE already includes only the old two values on a fresh DB; this block then
-- replaces it with the three-value list).
ALTER TABLE agent_sessions DROP CONSTRAINT IF EXISTS agent_sessions_backend_chk;
ALTER TABLE agent_sessions ADD CONSTRAINT agent_sessions_backend_chk
CHECK (backend IN ('opencode_server', 'acp_warm', 'claude_sdk'));
-- LISTEN/NOTIFY fast path: every tasks INSERT (from any call site — routes, -- LISTEN/NOTIFY fast path: every tasks INSERT (from any call site — routes,
-- new_task tool, arena, MCP server) fires pg_notify('tasks_new') in the same -- new_task tool, arena, MCP server) fires pg_notify('tasks_new') in the same
-- transaction, so the dispatcher reacts immediately instead of waiting for the -- transaction, so the dispatcher reacts immediately instead of waiting for the

View File

@@ -0,0 +1,252 @@
import { describe, it, expect, beforeAll, afterAll } from 'vitest';
import { readFileSync } from 'node:fs';
import { rm, mkdir } from 'node:fs/promises';
import { resolve } from 'node:path';
import postgres from 'postgres';
import {
buildShadowCommitCommand,
createCheckpoint,
restoreCheckpoint,
CheckpointNotFoundError,
} from '../checkpoints.js';
import { ensureSessionWorktree } from '../worktrees.js';
import { hostExec } from '../host-exec.js';
/**
* write-edit-robustness #4 — worktree checkpoint tests.
*
* Pure-helper coverage (no DB / no host) for the shadow-commit command builder,
* plus a DB+git integration block (DB-opt-in via DATABASE_URL, skips cleanly
* otherwise; mirrors reconnect_integration.test.ts) that exercises the real
* create → restore round trip against a worktree on the host fs.
*/
describe('buildShadowCommitCommand (pure)', () => {
it('parks the commit under refs/boocode/checkpoints/<id> and prints only the SHA', () => {
const cmd = buildShadowCommitCommand('/tmp/booworktrees/sess-abc', 'cp-id-123');
// Uses a temp index so the real working tree/index is untouched.
expect(cmd).toContain('TMP=$(mktemp)');
expect(cmd).toContain('GIT_INDEX_FILE="$TMP" git read-tree HEAD');
expect(cmd).toContain('GIT_INDEX_FILE="$TMP" git add -A');
expect(cmd).toContain('git write-tree');
expect(cmd).toContain("git commit-tree \"$TREE\" -p HEAD -m \"boocode checkpoint\"");
// Ref name matches the row id, and stdout is ONLY the SHA (printf, no newline).
expect(cmd).toContain("update-ref 'refs/boocode/checkpoints/cp-id-123'");
expect(cmd).toContain("printf '%s' \"$SHA\"");
expect(cmd).not.toContain('echo "$SHA"');
});
it('shell-escapes the worktree path and the id', () => {
const cmd = buildShadowCommitCommand("/tmp/it's a path", "id'; rm -rf /");
// Single quotes inside the path/id are escaped via the '\'' wrapping idiom — no
// bare interpolation that could break out of the quoting.
expect(cmd).toContain("cd '/tmp/it'\\''s a path'");
expect(cmd).toContain("refs/boocode/checkpoints/id'\\''; rm -rf /");
});
});
describe.runIf(!!process.env.DATABASE_URL)('checkpoint create + restore (DB + git)', () => {
let sql: ReturnType<typeof postgres>;
const stamp = Date.now();
const projectDir = `/tmp/boocode-checkpoint-proj-${stamp}`;
let projectId: string;
let sessionId: string;
let chatId: string;
let worktreePath: string;
beforeAll(async () => {
sql = postgres(process.env.DATABASE_URL!, { max: 3 });
// Server schema first (FK targets), then coder schema (worktrees + checkpoints).
const serverSchema = resolve(__dirname, '../../../../server/src/schema.sql');
const coderSchema = resolve(__dirname, '../../schema.sql');
await sql.unsafe(readFileSync(serverSchema, 'utf8'));
await sql.unsafe(readFileSync(coderSchema, 'utf8'));
await mkdir(projectDir, { recursive: true });
await hostExec(
`cd ${projectDir} && git init -q && git config user.email t@t && git config user.name t ` +
`&& echo hello > README.md && git add -A && git commit -qm init`,
{ timeoutMs: 20_000 },
);
const [project] = await sql<{ id: string }[]>`
INSERT INTO projects (name, path, status) VALUES ('checkpoint-test', ${projectDir}, 'open') RETURNING id
`;
projectId = project!.id;
const [session] = await sql<{ id: string }[]>`
INSERT INTO sessions (project_id, name, model, status)
VALUES (${projectId}, 'cp', 'm', 'open') RETURNING id
`;
sessionId = session!.id;
const [chat] = await sql<{ id: string }[]>`
INSERT INTO chats (session_id, name, status) VALUES (${sessionId}, 'tab', 'open') RETURNING id
`;
chatId = chat!.id;
const wt = await ensureSessionWorktree(sql, projectDir, sessionId);
worktreePath = wt.worktreePath;
});
afterAll(async () => {
if (sql) {
const rows = await sql<{ path: string }[]>`SELECT path FROM worktrees WHERE session_id = ${sessionId}`.catch(() => []);
for (const r of rows) {
await hostExec(`git -C ${projectDir} worktree remove ${r.path} --force`, { timeoutMs: 10_000 }).catch(() => {});
}
await sql`DELETE FROM checkpoints WHERE chat_id = ${chatId}`.catch(() => {});
await sql`DELETE FROM agent_sessions WHERE chat_id = ${chatId}`.catch(() => {});
await sql`DELETE FROM worktrees WHERE session_id = ${sessionId}`.catch(() => {});
await sql`DELETE FROM chats WHERE id = ${chatId}`.catch(() => {});
await sql`DELETE FROM sessions WHERE id = ${sessionId}`.catch(() => {});
await sql`DELETE FROM projects WHERE id = ${projectId}`.catch(() => {});
await sql.end({ timeout: 5 });
}
await rm(projectDir, { recursive: true, force: true });
});
it('createCheckpoint inserts a row + a private ref capturing tracked + untracked', async () => {
const [wt] = await sql<{ id: string }[]>`SELECT id FROM worktrees WHERE session_id = ${sessionId} AND status = 'active'`;
const worktreeId = wt!.id;
// Pre-turn untracked + tracked-edit state the agent will start from.
await hostExec(`cd ${worktreePath} && echo edited >> README.md && echo new > extra.txt`, { timeoutMs: 10_000 });
const [assistantMsg] = await sql<{ id: string }[]>`
INSERT INTO messages (session_id, chat_id, role, content, status)
VALUES (${sessionId}, ${chatId}, 'assistant', '', 'streaming') RETURNING id
`;
const messageId = assistantMsg!.id;
const cp = await createCheckpoint(sql, {
chatId,
sessionId,
worktreeId,
worktreePath,
messageId,
});
expect(cp).not.toBeNull();
expect(cp!.commit_sha).toMatch(/^[0-9a-f]{40}$/);
const [row] = await sql<{ commit_sha: string; worktree_id: string; message_id: string }[]>`
SELECT commit_sha, worktree_id, message_id FROM checkpoints WHERE id = ${cp!.id}
`;
expect(row!.commit_sha).toBe(cp!.commit_sha);
expect(row!.worktree_id).toBe(worktreeId);
expect(row!.message_id).toBe(messageId);
// The ref exists and the captured tree carries the untracked file (proves the
// temp-index `git add -A` snapshotted untracked content).
const refLs = await hostExec(
`git -C ${worktreePath} ls-tree -r --name-only ${cp!.commit_sha}`,
{ timeoutMs: 10_000 },
);
expect(refLs.exitCode).toBe(0);
expect(refLs.stdout).toContain('extra.txt');
// The shadow commit did NOT disturb the real working tree: extra.txt is still
// present + still untracked (status shows it).
const status = await hostExec(`git -C ${worktreePath} status --porcelain`, { timeoutMs: 10_000 });
expect(status.stdout).toContain('extra.txt');
});
it('restoreCheckpoint resets the worktree, trims the transcript, and drops later checkpoints', async () => {
// Clean slate for this test: reset the worktree to HEAD, clear prior rows.
await hostExec(`git -C ${worktreePath} reset --hard HEAD && git -C ${worktreePath} clean -fd`, { timeoutMs: 10_000 });
await sql`DELETE FROM checkpoints WHERE chat_id = ${chatId}`;
await sql`DELETE FROM messages WHERE chat_id = ${chatId}`;
const [wt] = await sql<{ id: string }[]>`SELECT id FROM worktrees WHERE session_id = ${sessionId} AND status = 'active'`;
const worktreeId = wt!.id;
// Turn 1: a user msg, then the assistant turn the checkpoint anchors. The
// worktree is pristine (matches HEAD) when this checkpoint is captured.
await sql`INSERT INTO messages (session_id, chat_id, role, content, status) VALUES (${sessionId}, ${chatId}, 'user', 'do it', 'complete')`;
const [a1] = await sql<{ id: string }[]>`
INSERT INTO messages (session_id, chat_id, role, content, status)
VALUES (${sessionId}, ${chatId}, 'assistant', 'turn 1', 'complete') RETURNING id
`;
const cp1 = await createCheckpoint(sql, { chatId, sessionId, worktreeId, worktreePath, messageId: a1!.id });
expect(cp1).not.toBeNull();
// The agent (turn 1) writes a file into the worktree.
await hostExec(`cd ${worktreePath} && echo agent-wrote > agent.txt`, { timeoutMs: 10_000 });
// Turn 2: another user msg + assistant turn, AND a second (later) checkpoint.
await sql`INSERT INTO messages (session_id, chat_id, role, content, status) VALUES (${sessionId}, ${chatId}, 'user', 'more', 'complete')`;
const [a2] = await sql<{ id: string }[]>`
INSERT INTO messages (session_id, chat_id, role, content, status)
VALUES (${sessionId}, ${chatId}, 'assistant', 'turn 2', 'complete') RETURNING id
`;
const cp2 = await createCheckpoint(sql, { chatId, sessionId, worktreeId, worktreePath, messageId: a2!.id });
expect(cp2).not.toBeNull();
// An agent_sessions row that restore should mark 'crashed'.
await sql`
INSERT INTO agent_sessions (chat_id, session_id, worktree_id, agent, backend, agent_session_id, status, last_active_at)
VALUES (${chatId}, ${sessionId}, ${worktreeId}, 'goose', 'acp_warm', 'sess-1', 'active', clock_timestamp())
ON CONFLICT (chat_id, agent) DO UPDATE SET status = 'active'
`;
const before = await sql<{ id: string }[]>`SELECT id FROM messages WHERE chat_id = ${chatId} ORDER BY created_at`;
expect(before.length).toBe(4); // user, a1, user, a2
// Restore to cp1 (before turn 1's assistant message).
const result = await restoreCheckpoint(sql, cp1!.id, { sessionId });
expect(result.checkpoint_id).toBe(cp1!.id);
expect(result.worktree_reset).toBe(true);
expect(result.backend_reset).toBe(true);
// a1, user(turn2), a2 deleted (created_at >= a1) → 3 trimmed.
expect(result.messages_deleted).toBe(3);
// Transcript trimmed to just the first user message.
const after = await sql<{ role: string; content: string }[]>`SELECT role, content FROM messages WHERE chat_id = ${chatId} ORDER BY created_at`;
expect(after.length).toBe(1);
expect(after[0]!.role).toBe('user');
// Worktree reset: the agent's file is gone (it was written after cp1).
const ls = await hostExec(`ls ${worktreePath}/agent.txt`, { timeoutMs: 10_000 });
expect(ls.exitCode).not.toBe(0);
// The agent_sessions row was reset to 'crashed'.
const [as] = await sql<{ status: string }[]>`SELECT status FROM agent_sessions WHERE chat_id = ${chatId} AND agent = 'goose'`;
expect(as!.status).toBe('crashed');
// cp1 survives (re-restorable); cp2 (later) was dropped.
const cps = await sql<{ id: string }[]>`SELECT id FROM checkpoints WHERE chat_id = ${chatId}`;
expect(cps.map((c) => c.id)).toEqual([cp1!.id]);
});
it('restoreCheckpoint throws CheckpointNotFoundError for an unknown id', async () => {
await expect(
restoreCheckpoint(sql, '00000000-0000-0000-0000-000000000000', { sessionId }),
).rejects.toBeInstanceOf(CheckpointNotFoundError);
});
it('restoreCheckpoint throws when the checkpoint is not in the requested session', async () => {
// A checkpoint whose session_id differs from the route's sessionId.
const [wt] = await sql<{ id: string }[]>`SELECT id FROM worktrees WHERE session_id = ${sessionId} AND status = 'active'`;
const cp = await createCheckpoint(sql, { chatId, sessionId, worktreeId: wt!.id, worktreePath, messageId: null });
expect(cp).not.toBeNull();
await expect(
restoreCheckpoint(sql, cp!.id, { sessionId: '11111111-1111-1111-1111-111111111111' }),
).rejects.toBeInstanceOf(CheckpointNotFoundError);
await sql`DELETE FROM checkpoints WHERE id = ${cp!.id}`;
});
it('restoreCheckpoint denies a NULL-session_id checkpoint from another session (no fail-open IDOR)', async () => {
// Regression for the fail-open authorization bug: a checkpoint row whose
// denormalized session_id is NULL must STILL be scoped via its chat's owning
// session (chats.session_id), not skipped. The old guard `cp.session_id &&
// cp.session_id !== sessionId` fell through on NULL → cross-session restore.
const [row] = await sql<{ id: string }[]>`
INSERT INTO checkpoints (chat_id, session_id, message_id, commit_sha)
VALUES (${chatId}, NULL, NULL, 'deadbeef')
RETURNING id
`;
await expect(
restoreCheckpoint(sql, row!.id, { sessionId: '22222222-2222-2222-2222-222222222222' }),
).rejects.toBeInstanceOf(CheckpointNotFoundError);
await sql`DELETE FROM checkpoints WHERE id = ${row!.id}`;
});
});

View File

@@ -0,0 +1,173 @@
import { describe, it, expect } from 'vitest';
import { locateMatch, SIMILARITY_THRESHOLD } from '../fuzzy-match.js';
// Helper: assert a resolved span and slice it back out of the content so the
// test pins the EXACT file text the caller would replace.
function span(result: ReturnType<typeof locateMatch>): { start: number; end: number } {
if (result.kind !== 'exact' && result.kind !== 'fuzzy') {
throw new Error(`expected a located span, got ${result.kind}`);
}
return { start: result.start, end: result.end };
}
describe('locateMatch — strategy 1: exact', () => {
it('returns an exact unique span', () => {
const content = 'alpha\nbeta\ngamma\n';
const result = locateMatch(content, 'beta');
expect(result.kind).toBe('exact');
const { start, end } = span(result);
expect(content.slice(start, end)).toBe('beta');
});
it('returns the right offsets for a multi-line exact needle', () => {
const content = 'one\ntwo\nthree\nfour\n';
const needle = 'two\nthree';
const result = locateMatch(content, needle);
expect(result.kind).toBe('exact');
const { start, end } = span(result);
expect(content.slice(start, end)).toBe(needle);
});
it('refuses when the exact needle occurs more than once', () => {
const content = 'foo\nbar\nfoo\nbar\nfoo\n';
const result = locateMatch(content, 'foo');
expect(result).toEqual({ kind: 'ambiguous', count: 3 });
});
});
describe('locateMatch — strategy 2: per-line whitespace', () => {
it('matches across trailing-whitespace drift at the real span', () => {
// File has trailing spaces the model dropped from a TWO-line copy. A
// single-line needle would be located by exact indexOf (it's a substring),
// so use two lines where line 1's trailing ws breaks an exact substring run.
const content = 'function f() {\n setup(); \n return 1;\n}\n';
const needle = ' setup();\n return 1;'; // line 1 missing trailing spaces
const result = locateMatch(content, needle);
expect(result.kind).toBe('fuzzy');
const { start, end } = span(result);
// The returned span covers the ORIGINAL lines including the trailing spaces.
expect(content.slice(start, end)).toBe(' setup(); \n return 1;');
});
it('matches across indentation drift (multi-line block)', () => {
// File indents with 4 spaces; model emitted 2-space indentation. trimEnd
// alone does not normalize LEADING whitespace, so this exercises... actually
// leading-indent drift is a Levenshtein-tier fallback. Here we keep the
// leading indent identical and drift only trailing whitespace per line.
const content = ['if (x) {', ' doThing(); ', ' doOther();', '}'].join('\n');
const needle = [' doThing();', ' doOther();'].join('\n');
const result = locateMatch(content, needle);
expect(result.kind).toBe('fuzzy');
const { start, end } = span(result);
expect(content.slice(start, end)).toBe(' doThing(); \n doOther();');
});
it('ignores leading/trailing blank needle lines', () => {
const content = 'header\nbody line\nfooter\n';
const needle = '\n\nbody line\n\n';
const result = locateMatch(content, needle);
expect(result.kind).toBe('fuzzy');
const { start, end } = span(result);
expect(content.slice(start, end)).toBe('body line');
});
it('reports ambiguous when a whitespace-window matches twice', () => {
// Both line 1 and line 4 differ from the needle only by trailing whitespace,
// so exact indexOf fails (no exact substring) and the whitespace tier finds
// two equivalent windows → ambiguous.
const content = 'x = 1; \ny = 2;\nz = 3;\nx = 1;\t\n';
const needle = 'x = 1;'; // no trailing ws → not an exact substring of either line
const result = locateMatch(content, needle);
expect(result).toEqual({ kind: 'ambiguous', count: 2 });
});
});
describe('locateMatch — strategy 3: unicode canonicalization', () => {
it('matches across curly quotes', () => {
const content = "const s = 'hello';\n";
const needle = 'const s = hello;'; // hello
const result = locateMatch(content, needle);
expect(result.kind).toBe('fuzzy');
const { start, end } = span(result);
// Span maps back to ORIGINAL (straight-quote) text.
expect(content.slice(start, end)).toBe("const s = 'hello';");
});
it('matches across curly double-quotes', () => {
const content = 'log("done");\n';
const needle = 'log(“done”);'; // “done”
const result = locateMatch(content, needle);
expect(result.kind).toBe('fuzzy');
const { start, end } = span(result);
expect(content.slice(start, end)).toBe('log("done");');
});
it('matches across an em-dash drift', () => {
const content = 'range 1-10 inclusive\n';
const needle = 'range 1—10 inclusive'; // em-dash
const result = locateMatch(content, needle);
expect(result.kind).toBe('fuzzy');
const { start, end } = span(result);
expect(content.slice(start, end)).toBe('range 1-10 inclusive');
});
it('matches across a non-breaking space drift', () => {
const content = 'a b c\n'; // plain spaces
const needle = 'a b c'; // nbsp between words
const result = locateMatch(content, needle);
expect(result.kind).toBe('fuzzy');
const { start, end } = span(result);
expect(content.slice(start, end)).toBe('a b c');
});
});
describe('locateMatch — strategy 4: Levenshtein', () => {
it('matches a >= threshold near-miss (small typo drift)', () => {
// Needle has a one-char typo ('totals' vs 'total') so it is NOT an exact
// substring and the whitespace/canonical tiers (which require equality) both
// miss; Levenshtein similarity stays well above the 0.66 floor.
const content = 'const total = sum + tax;\n';
const needle = 'const totals = sum + tax;';
const result = locateMatch(content, needle);
expect(result.kind).toBe('fuzzy');
const { start, end } = span(result);
// Span maps to the real (correctly-spelled) file line.
expect(content.slice(start, end)).toBe('const total = sum + tax;');
});
it('matches a multi-line block with indentation drift via Levenshtein', () => {
const content = ['function g() {', ' return compute(a, b);', '}'].join('\n');
// 6-space indent vs file's 2-space; trimEnd does not fix leading indent, so
// this lands on the Levenshtein tier (joined-trim makes it identical → ~1.0).
const needle = [' return compute(a, b);'].join('\n');
const result = locateMatch(content, needle);
expect(result.kind).toBe('fuzzy');
const { start, end } = span(result);
expect(content.slice(start, end)).toBe(' return compute(a, b);');
});
it('returns not_found for a below-threshold miss', () => {
const content = 'the quick brown fox jumps over the lazy dog\n';
const needle = 'completely unrelated string of text here xyz';
const result = locateMatch(content, needle);
expect(result).toEqual({ kind: 'not_found' });
});
it('returns not_found for a genuinely-absent needle', () => {
const content = 'alpha\nbeta\ngamma\n';
const needle = 'this content does not exist anywhere at all';
const result = locateMatch(content, needle);
expect(result).toEqual({ kind: 'not_found' });
});
});
describe('locateMatch — edge cases', () => {
it('returns not_found for an empty needle', () => {
expect(locateMatch('anything', '')).toEqual({ kind: 'not_found' });
});
it('exposes a sane similarity threshold', () => {
expect(SIMILARITY_THRESHOLD).toBeGreaterThan(0);
expect(SIMILARITY_THRESHOLD).toBeLessThanOrEqual(1);
});
});

View File

@@ -0,0 +1,83 @@
import { describe, it, expect } from 'vitest';
import { normalizeAgentEvent } from '../normalize-agent-status.js';
describe('normalizeAgentEvent', () => {
describe('working bucket', () => {
const cases = [
'SessionStart',
'UserPromptSubmit',
'UserPromptSubmitted',
'PostToolUse',
'PostToolUseFailure',
'BeforeAgent',
'AfterTool',
'task_started',
];
for (const name of cases) {
it(`maps ${name} → working`, () => {
expect(normalizeAgentEvent(name)).toBe('working');
});
}
});
describe('blocked bucket', () => {
const cases = [
'PreToolUse',
'Notification',
'PermissionRequest',
'exec_approval_request',
'apply_patch_approval_request',
'request_user_input',
];
for (const name of cases) {
it(`maps ${name} → blocked`, () => {
expect(normalizeAgentEvent(name)).toBe('blocked');
});
}
});
describe('done bucket', () => {
const cases = [
'Stop',
'AfterAgent',
'SessionEnd',
'task_complete',
'agent-turn-complete',
];
for (const name of cases) {
it(`maps ${name} → done`, () => {
expect(normalizeAgentEvent(name)).toBe('done');
});
}
});
describe('unknown / nullish → null', () => {
it('returns null for an unrecognized event', () => {
expect(normalizeAgentEvent('SomeRandomEvent')).toBeNull();
});
it('returns null for empty string', () => {
expect(normalizeAgentEvent('')).toBeNull();
});
it('returns null for undefined', () => {
expect(normalizeAgentEvent(undefined)).toBeNull();
});
});
describe('case- and separator-insensitive matching', () => {
it('matches snake_case spelling of a PascalCase event', () => {
expect(normalizeAgentEvent('session_start')).toBe('working');
expect(normalizeAgentEvent('post_tool_use')).toBe('working');
expect(normalizeAgentEvent('pre_tool_use')).toBe('blocked');
});
it('matches camelCase spelling', () => {
expect(normalizeAgentEvent('userPromptSubmitted')).toBe('working');
expect(normalizeAgentEvent('postToolUse')).toBe('working');
expect(normalizeAgentEvent('preToolUse')).toBe('blocked');
expect(normalizeAgentEvent('sessionEnd')).toBe('done');
});
it('matches arbitrary case', () => {
expect(normalizeAgentEvent('STOP')).toBe('done');
expect(normalizeAgentEvent('notification')).toBe('blocked');
});
});
});

View File

@@ -0,0 +1,189 @@
import { describe, it, expect } from 'vitest';
import {
makeStreamJsonParser,
makeStreamJsonState,
parseStreamJsonLine,
type AgentEventList,
} from '../stream-json-parser.js';
import type { AgentEvent } from '../agent-backend.js';
import type { AcpToolSnapshot } from '../acp-tool-snapshot.js';
// Helpers to JSON-encode the representative Claude-Code stream-json lines.
const sys = (sessionId: string) =>
JSON.stringify({ type: 'system', subtype: 'init', session_id: sessionId, tools: ['read', 'edit'] });
const streamEvent = (event: unknown) => JSON.stringify({ type: 'stream_event', event });
const textDelta = (index: number, text: string) =>
streamEvent({ type: 'content_block_delta', index, delta: { type: 'text_delta', text } });
const thinkingDelta = (index: number, thinking: string) =>
streamEvent({ type: 'content_block_delta', index, delta: { type: 'thinking_delta', thinking } });
const toolStart = (index: number, id: string, name: string) =>
streamEvent({ type: 'content_block_start', index, content_block: { type: 'tool_use', id, name } });
const inputJsonDelta = (index: number, partial: string) =>
streamEvent({ type: 'content_block_delta', index, delta: { type: 'input_json_delta', partial_json: partial } });
const blockStop = (index: number) => streamEvent({ type: 'content_block_stop', index });
const resultLine = (input: number, output: number, sessionId?: string) =>
JSON.stringify({ type: 'result', subtype: 'success', session_id: sessionId, usage: { input_tokens: input, output_tokens: output } });
describe('parseStreamJsonLine (pure per-line mapping)', () => {
it('captures session_id from the system init line and emits no events', () => {
const state = makeStreamJsonState();
const events = parseStreamJsonLine(sys('sess-abc'), state);
expect(events).toEqual([]);
expect(state.sessionId).toBe('sess-abc');
});
it('maps a text_delta stream_event → a text event', () => {
const state = makeStreamJsonState();
expect(parseStreamJsonLine(textDelta(0, 'Hello'), state)).toEqual([{ type: 'text', text: 'Hello' }]);
});
it('maps a thinking_delta stream_event → a reasoning event', () => {
const state = makeStreamJsonState();
expect(parseStreamJsonLine(thinkingDelta(0, 'pondering'), state)).toEqual([
{ type: 'reasoning', text: 'pondering' },
]);
});
it('tolerates a garbage / non-JSON line (returns [], no throw)', () => {
const state = makeStreamJsonState();
expect(parseStreamJsonLine('not json at all {{{', state)).toEqual([]);
expect(parseStreamJsonLine('', state)).toEqual([]);
expect(parseStreamJsonLine(' ', state)).toEqual([]);
// A truncated/partial JSON object also yields [] rather than throwing.
expect(parseStreamJsonLine('{"type":"stream_event","eve', state)).toEqual([]);
});
it('ignores unknown top-level line types and the user (tool-result) line', () => {
const state = makeStreamJsonState();
expect(parseStreamJsonLine(JSON.stringify({ type: 'user', message: {} }), state)).toEqual([]);
expect(parseStreamJsonLine(JSON.stringify({ type: 'whatever' }), state)).toEqual([]);
});
it('assembles a tool call across input_json_delta chunks (split across lines)', () => {
const state = makeStreamJsonState();
// start → tool_call (running, empty args)
const start = parseStreamJsonLine(toolStart(1, 'toolu_1', 'edit_file'), state);
expect(start).toHaveLength(1);
expect(start[0]!.type).toBe('tool_call');
const startSnap = (start[0] as { type: 'tool_call'; toolCall: AcpToolSnapshot }).toolCall;
expect(startSnap.toolCallId).toBe('toolu_1');
expect(startSnap.title).toBe('edit_file');
expect(startSnap.status).toBe('in_progress');
expect(startSnap.rawInput).toEqual({});
// args streamed in fragments — no events until stop
expect(parseStreamJsonLine(inputJsonDelta(1, '{"path":"a'), state)).toEqual([]);
expect(parseStreamJsonLine(inputJsonDelta(1, '.ts","content":'), state)).toEqual([]);
expect(parseStreamJsonLine(inputJsonDelta(1, '"hi"}'), state)).toEqual([]);
// stop → tool_update with the parsed, fully-assembled input
const stop = parseStreamJsonLine(blockStop(1), state);
expect(stop).toHaveLength(1);
expect(stop[0]!.type).toBe('tool_update');
const stopSnap = (stop[0] as { type: 'tool_update'; toolCall: AcpToolSnapshot }).toolCall;
expect(stopSnap.toolCallId).toBe('toolu_1');
expect(stopSnap.status).toBe('completed');
expect(stopSnap.rawInput).toEqual({ path: 'a.ts', content: 'hi' });
});
it('falls back to {_raw} when accumulated tool args are not valid JSON', () => {
const state = makeStreamJsonState();
parseStreamJsonLine(toolStart(0, 'toolu_x', 'run'), state);
parseStreamJsonLine(inputJsonDelta(0, '{"broken'), state);
const stop = parseStreamJsonLine(blockStop(0), state);
const snap = (stop[0] as { type: 'tool_update'; toolCall: AcpToolSnapshot }).toolCall;
expect(snap.rawInput).toEqual({ _raw: '{"broken' });
});
it('captures usage from message_delta and result lines', () => {
const state = makeStreamJsonState();
parseStreamJsonLine(streamEvent({ type: 'message_delta', usage: { output_tokens: 42 } }), state);
expect(state.usage.outputTokens).toBe(42);
parseStreamJsonLine(resultLine(100, 250, 'sess-z'), state);
expect(state.usage.inputTokens).toBe(100);
expect(state.usage.outputTokens).toBe(250);
expect(state.sessionId).toBe('sess-z');
});
it('maps a terminal assistant message (fallback) → text + reasoning + tool events', () => {
const state = makeStreamJsonState();
const line = JSON.stringify({
type: 'assistant',
session_id: 'sess-asst',
message: {
content: [
{ type: 'thinking', thinking: 'let me think' },
{ type: 'text', text: 'Here is the answer' },
{ type: 'tool_use', id: 'toolu_9', name: 'view_file', input: { path: 'x.ts' } },
],
usage: { input_tokens: 5, output_tokens: 7 },
},
});
const events = parseStreamJsonLine(line, state);
expect(events).toEqual([
{ type: 'reasoning', text: 'let me think' },
{ type: 'text', text: 'Here is the answer' },
{
type: 'tool_update',
toolCall: { toolCallId: 'toolu_9', title: 'view_file', kind: null, status: 'completed', rawInput: { path: 'x.ts' } },
},
]);
expect(state.usage).toEqual({ inputTokens: 5, outputTokens: 7 });
expect(state.sessionId).toBe('sess-asst');
});
});
describe('makeStreamJsonParser (stateful wrapper over a full turn)', () => {
it('streams a representative turn: init → text → thinking → tool → result', () => {
const parser = makeStreamJsonParser();
const all: AgentEvent[] = [];
const feed = (line: string): AgentEventList => {
const evs = parser.push(line);
all.push(...evs);
return evs;
};
feed(sys('sess-1'));
feed(textDelta(0, 'Reading '));
feed(textDelta(0, 'the file. '));
feed(thinkingDelta(0, 'I should edit it'));
feed(toolStart(1, 'toolu_a', 'edit_file'));
feed(inputJsonDelta(1, '{"path":'));
feed(inputJsonDelta(1, '"main.ts"}'));
feed(blockStop(1));
feed(textDelta(0, 'Done.'));
feed(resultLine(120, 80, 'sess-1'));
expect(all).toEqual([
{ type: 'text', text: 'Reading ' },
{ type: 'text', text: 'the file. ' },
{ type: 'reasoning', text: 'I should edit it' },
{
type: 'tool_call',
toolCall: { toolCallId: 'toolu_a', title: 'edit_file', kind: null, status: 'in_progress', rawInput: {} },
},
{
type: 'tool_update',
toolCall: { toolCallId: 'toolu_a', title: 'edit_file', kind: null, status: 'completed', rawInput: { path: 'main.ts' } },
},
{ type: 'text', text: 'Done.' },
]);
expect(parser.usage()).toEqual({ inputTokens: 120, outputTokens: 80 });
expect(parser.sessionId()).toBe('sess-1');
});
it('a garbage line interleaved mid-turn does not derail subsequent parsing', () => {
const parser = makeStreamJsonParser();
expect(parser.push(textDelta(0, 'a'))).toEqual([{ type: 'text', text: 'a' }]);
expect(parser.push('>>> not json <<<')).toEqual([]);
expect(parser.push(textDelta(0, 'b'))).toEqual([{ type: 'text', text: 'b' }]);
});
});

View File

@@ -13,7 +13,7 @@ import type { AcpToolSnapshot } from './acp-tool-snapshot.js';
import type { AgentCommand } from './provider-types.js'; import type { AgentCommand } from './provider-types.js';
/** Backend transport kind. Mirrors `agent_sessions.backend` CHECK in schema.sql. */ /** Backend transport kind. Mirrors `agent_sessions.backend` CHECK in schema.sql. */
export type AgentBackendKind = 'opencode_server' | 'acp_warm'; export type AgentBackendKind = 'opencode_server' | 'acp_warm' | 'claude_sdk';
/** /**
* Normalized, transport-agnostic events a backend emits during a turn (§2). * Normalized, transport-agnostic events a backend emits during a turn (§2).

View File

@@ -0,0 +1,55 @@
/**
* agent-status-publish (#10) — builds + publishes the `agent_status_updated`
* WS frame on the per-session channel (the same channel CoderPane subscribes to).
*
* Kept separate from normalize-agent-status.ts so that module stays a pure,
* broker-free helper (trivially unit-testable; reused by the config-injection
* follow-on). The frame contract is pinned in apps/server/src/types/ws-frames.ts
* (`AgentStatusUpdatedFrame`) and mirrored byte-identical in apps/web.
*/
import type { Broker } from '@boocode/server/broker';
import type { WsFrame } from '@boocode/server/ws-frames';
import type { AgentStatus } from './normalize-agent-status.js';
// The exact slice of Broker we need — accepting just the bound method keeps call
// sites flexible (pass `broker.publishFrame.bind(broker)` or, since the broker's
// publishFrame doesn't read `this`, `broker.publishFrame` directly).
type PublishFrame = Broker['publishFrame'];
/**
* Best-effort publish of a normalized agent status. The broker's publishFrame
* already fail-closes (validates + logs + drops on bad input, never throws), but
* we additionally swallow any unexpected error so a publish can NEVER break the
* turn it's reporting on.
*
* @param publishFrame the session channel publisher (broker.publishFrame)
* @param sessionId WS subscription channel (CoderPane subscribes per-session)
* @param chatId the (chat) half of the (chat, agent) status key
* @param agent the (agent) half of the key
* @param status normalized lifecycle status
* @param reason free-form discriminator (turn_start / turn_complete / …)
* @param at ISO timestamp; defaults to now
*/
export function publishAgentStatus(
publishFrame: PublishFrame,
sessionId: string,
chatId: string,
agent: string,
status: AgentStatus,
reason?: string,
at: string = new Date().toISOString(),
): void {
try {
const frame: WsFrame = {
type: 'agent_status_updated',
chat_id: chatId,
agent,
status,
...(reason ? { reason } : {}),
at,
};
publishFrame(sessionId, frame);
} catch {
// never let a status publish break the turn — best-effort only.
}
}

View File

@@ -0,0 +1,181 @@
import { describe, it, expect } from 'vitest';
import type { SDKMessage } from '@anthropic-ai/claude-agent-sdk';
import { mapSdkMessage, createClaudeSdkMapState } from '../claude-sdk-map.js';
import type { AgentEvent } from '../../agent-backend.js';
/**
* Pure mapper for Claude-SDK messages → AgentEvents (claude-sdk-sessionstore #9 Part 2).
* Verifies the partial-stream → live-delta mapping, tool assembly across blocks, and
* the final-assistant dedup, with no live `claude` binary involved.
*
* Messages are cast through `unknown` to `SDKMessage`: the real SDK shapes carry many
* fields (uuid, parent_tool_use_id, …) irrelevant to the mapper, which reads only the
* `type`/`event`/`message.content` it discriminates on. The cast keeps the fixtures
* minimal while the production code path sees the full real types (the backend's
* typecheck against the real SDK is the type-safety proof).
*/
function msg(m: unknown): SDKMessage {
return m as SDKMessage;
}
/** A partial-stream message wrapping one BetaRawMessageStreamEvent. */
function streamEvent(event: unknown): SDKMessage {
return msg({ type: 'stream_event', event, parent_tool_use_id: null, uuid: 'u', session_id: 's' });
}
describe('mapSdkMessage — partial stream deltas', () => {
it('maps a text_delta to a text event', () => {
const state = createClaudeSdkMapState();
const out = mapSdkMessage(
streamEvent({ type: 'content_block_delta', index: 0, delta: { type: 'text_delta', text: 'Hello' } }),
state,
);
expect(out).toEqual<AgentEvent[]>([{ type: 'text', text: 'Hello' }]);
});
it('maps a thinking_delta to a reasoning event', () => {
const state = createClaudeSdkMapState();
const out = mapSdkMessage(
streamEvent({
type: 'content_block_delta',
index: 0,
delta: { type: 'thinking_delta', thinking: 'pondering', estimated_tokens: null },
}),
state,
);
expect(out).toEqual<AgentEvent[]>([{ type: 'reasoning', text: 'pondering' }]);
});
it('drops empty text/thinking deltas', () => {
const state = createClaudeSdkMapState();
expect(
mapSdkMessage(streamEvent({ type: 'content_block_delta', index: 0, delta: { type: 'text_delta', text: '' } }), state),
).toEqual([]);
expect(
mapSdkMessage(
streamEvent({ type: 'content_block_delta', index: 0, delta: { type: 'thinking_delta', thinking: '', estimated_tokens: null } }),
state,
),
).toEqual([]);
});
it('ignores message framing + signature/citation deltas', () => {
const state = createClaudeSdkMapState();
expect(mapSdkMessage(streamEvent({ type: 'message_start', message: {} }), state)).toEqual([]);
expect(mapSdkMessage(streamEvent({ type: 'message_stop' }), state)).toEqual([]);
expect(
mapSdkMessage(streamEvent({ type: 'content_block_delta', index: 0, delta: { type: 'signature_delta', signature: 'x' } }), state),
).toEqual([]);
});
});
describe('mapSdkMessage — tool assembly across blocks', () => {
it('opens a tool_call on content_block_start, buffers input_json_delta, emits tool_update with parsed input on stop', () => {
const state = createClaudeSdkMapState();
const started = mapSdkMessage(
streamEvent({
type: 'content_block_start',
index: 1,
content_block: { type: 'tool_use', id: 'tool-1', name: 'view_file', input: {} },
}),
state,
);
expect(started).toEqual<AgentEvent[]>([
{ type: 'tool_call', toolCall: { toolCallId: 'tool-1', title: 'view_file', kind: null, status: 'in_progress', rawInput: {}, rawOutput: undefined } },
]);
// args stream in fragments under the same block index
expect(
mapSdkMessage(streamEvent({ type: 'content_block_delta', index: 1, delta: { type: 'input_json_delta', partial_json: '{"path":' } }), state),
).toEqual([]);
expect(
mapSdkMessage(streamEvent({ type: 'content_block_delta', index: 1, delta: { type: 'input_json_delta', partial_json: '"a.ts"}' } }), state),
).toEqual([]);
const stopped = mapSdkMessage(streamEvent({ type: 'content_block_stop', index: 1 }), state);
expect(stopped).toHaveLength(1);
const ev = stopped[0]!;
expect(ev.type).toBe('tool_update');
if (ev.type === 'tool_update') {
expect(ev.toolCall.toolCallId).toBe('tool-1');
expect(ev.toolCall.title).toBe('view_file');
expect(ev.toolCall.rawInput).toEqual({ path: 'a.ts' });
}
});
it('content_block_stop for a non-tool block (no tracked index) emits nothing', () => {
const state = createClaudeSdkMapState();
// text block was streamed at index 0 but never tracked as a tool
mapSdkMessage(streamEvent({ type: 'content_block_delta', index: 0, delta: { type: 'text_delta', text: 'hi' } }), state);
expect(mapSdkMessage(streamEvent({ type: 'content_block_stop', index: 0 }), state)).toEqual([]);
});
it('falls back to the prior input when the buffered tool JSON is invalid', () => {
const state = createClaudeSdkMapState();
mapSdkMessage(
streamEvent({ type: 'content_block_start', index: 2, content_block: { type: 'tool_use', id: 't2', name: 'grep', input: { q: 'seed' } } }),
state,
);
mapSdkMessage(streamEvent({ type: 'content_block_delta', index: 2, delta: { type: 'input_json_delta', partial_json: '{not json' } }), state);
const stopped = mapSdkMessage(streamEvent({ type: 'content_block_stop', index: 2 }), state);
const ev = stopped[0]!;
if (ev.type === 'tool_update') {
expect(ev.toolCall.rawInput).toEqual({ q: 'seed' });
} else {
throw new Error('expected tool_update');
}
});
});
describe('mapSdkMessage — final assistant message', () => {
function assistant(content: unknown[]): SDKMessage {
return msg({ type: 'assistant', message: { content }, parent_tool_use_id: null, uuid: 'u', session_id: 's' });
}
it('dedups text/thinking (already streamed) and emits a completed tool_update per tool_use block', () => {
const state = createClaudeSdkMapState();
const out = mapSdkMessage(
assistant([
{ type: 'text', text: 'final answer', citations: null },
{ type: 'thinking', thinking: 'reasoned', signature: 'sig' },
{ type: 'tool_use', id: 'tool-9', name: 'find_files', input: { glob: '**/*.ts' } },
]),
state,
);
expect(out).toEqual<AgentEvent[]>([
{
type: 'tool_update',
toolCall: { toolCallId: 'tool-9', title: 'find_files', kind: null, status: 'completed', rawInput: { glob: '**/*.ts' }, rawOutput: undefined },
},
]);
});
it('preserves a title from a prior partial tool_call snapshot', () => {
const state = createClaudeSdkMapState();
mapSdkMessage(
streamEvent({ type: 'content_block_start', index: 0, content_block: { type: 'tool_use', id: 'tool-x', name: 'view_file', input: {} } }),
state,
);
const out = mapSdkMessage(assistant([{ type: 'tool_use', id: 'tool-x', name: 'view_file', input: { path: 'z' } }]), state);
const ev = out[0]!;
if (ev.type === 'tool_update') {
expect(ev.toolCall.status).toBe('completed');
expect(ev.toolCall.title).toBe('view_file');
expect(ev.toolCall.rawInput).toEqual({ path: 'z' });
} else {
throw new Error('expected tool_update');
}
});
});
describe('mapSdkMessage — non-content messages', () => {
it('returns [] for system/init, status, result, and other variants', () => {
const state = createClaudeSdkMapState();
expect(mapSdkMessage(msg({ type: 'system', subtype: 'init', session_id: 's', uuid: 'u' }), state)).toEqual([]);
expect(mapSdkMessage(msg({ type: 'system', subtype: 'status', status: null, session_id: 's', uuid: 'u' }), state)).toEqual([]);
expect(
mapSdkMessage(msg({ type: 'result', subtype: 'success', result: 'done', session_id: 's', uuid: 'u' }), state),
).toEqual([]);
});
});

View File

@@ -0,0 +1,49 @@
import { describe, it, expect } from 'vitest';
import { shouldUseClaudeSdk, claudeSdkBackendEnabled } from '../claude-sdk-routing.js';
/**
* Env-flagged routing for the warm Claude-SDK backend. With CLAUDE_SDK_BACKEND off
* (the production default) every claude task falls through to the unchanged PTY path;
* with it on, only chat-tab claude tasks (session_id + chat_id) route to the SDK.
*/
const ON = { CLAUDE_SDK_BACKEND: '1' } as NodeJS.ProcessEnv;
const OFF = {} as NodeJS.ProcessEnv;
describe('claudeSdkBackendEnabled', () => {
it('is false when unset or falsy', () => {
expect(claudeSdkBackendEnabled({} as NodeJS.ProcessEnv)).toBe(false);
expect(claudeSdkBackendEnabled({ CLAUDE_SDK_BACKEND: '' } as NodeJS.ProcessEnv)).toBe(false);
expect(claudeSdkBackendEnabled({ CLAUDE_SDK_BACKEND: '0' } as NodeJS.ProcessEnv)).toBe(false);
expect(claudeSdkBackendEnabled({ CLAUDE_SDK_BACKEND: 'false' } as NodeJS.ProcessEnv)).toBe(false);
expect(claudeSdkBackendEnabled({ CLAUDE_SDK_BACKEND: 'off' } as NodeJS.ProcessEnv)).toBe(false);
expect(claudeSdkBackendEnabled({ CLAUDE_SDK_BACKEND: 'no' } as NodeJS.ProcessEnv)).toBe(false);
});
it('is true for any other truthy value', () => {
expect(claudeSdkBackendEnabled({ CLAUDE_SDK_BACKEND: '1' } as NodeJS.ProcessEnv)).toBe(true);
expect(claudeSdkBackendEnabled({ CLAUDE_SDK_BACKEND: 'true' } as NodeJS.ProcessEnv)).toBe(true);
expect(claudeSdkBackendEnabled({ CLAUDE_SDK_BACKEND: 'on' } as NodeJS.ProcessEnv)).toBe(true);
});
});
describe('shouldUseClaudeSdk', () => {
it('is always false while the env flag is off — production claude stays on PTY', () => {
expect(shouldUseClaudeSdk({ agent: 'claude', session_id: 's1', chat_id: 'c1' }, OFF)).toBe(false);
});
it('routes a chat-tab claude task to the SDK when the flag is on', () => {
expect(shouldUseClaudeSdk({ agent: 'claude', session_id: 's1', chat_id: 'c1' }, ON)).toBe(true);
});
it('only applies to the claude agent', () => {
expect(shouldUseClaudeSdk({ agent: 'qwen', session_id: 's1', chat_id: 'c1' }, ON)).toBe(false);
expect(shouldUseClaudeSdk({ agent: 'opencode', session_id: 's1', chat_id: 'c1' }, ON)).toBe(false);
expect(shouldUseClaudeSdk({ agent: null, session_id: 's1', chat_id: 'c1' }, ON)).toBe(false);
});
it('requires both session_id and chat_id (session-less creators stay one-shot)', () => {
expect(shouldUseClaudeSdk({ agent: 'claude', session_id: null, chat_id: null }, ON)).toBe(false);
expect(shouldUseClaudeSdk({ agent: 'claude', session_id: 's1', chat_id: null }, ON)).toBe(false);
expect(shouldUseClaudeSdk({ agent: 'claude', session_id: null, chat_id: 'c1' }, ON)).toBe(false);
});
});

View File

@@ -0,0 +1,135 @@
import { describe, it, expect, beforeAll, afterAll } from 'vitest';
import { readFileSync } from 'node:fs';
import { resolve } from 'node:path';
import postgres from 'postgres';
import { PostgresSessionStore } from '../claude-session-store.js';
import type { SessionStoreEntry } from '@anthropic-ai/claude-agent-sdk';
/**
* claude-sdk-sessionstore #9 (Part 1) — PostgresSessionStore tests.
*
* DB-opt-in (DATABASE_URL), mirrors checkpoints.test.ts: skips cleanly when the
* var is unset; otherwise applies the server + coder schemas and exercises the
* real append/load/listSessions/delete/listSubkeys round trips against postgres.
* Rows are namespaced under a unique project_key so concurrent suites / leftover
* data can't collide, and afterAll deletes everything written.
*/
describe.runIf(!!process.env.DATABASE_URL)('PostgresSessionStore (DB)', () => {
let sql: ReturnType<typeof postgres>;
let store: PostgresSessionStore;
const projectKey = `claude-store-test-${Date.now()}`;
const entry = (type: string, extra: Record<string, unknown> = {}): SessionStoreEntry => ({
type,
...extra,
});
beforeAll(async () => {
sql = postgres(process.env.DATABASE_URL!, { max: 3 });
const serverSchema = resolve(__dirname, '../../../../../server/src/schema.sql');
const coderSchema = resolve(__dirname, '../../../schema.sql');
await sql.unsafe(readFileSync(serverSchema, 'utf8'));
await sql.unsafe(readFileSync(coderSchema, 'utf8'));
store = new PostgresSessionStore(sql);
});
afterAll(async () => {
if (sql) {
await sql`DELETE FROM claude_session_entries WHERE project_key = ${projectKey}`.catch(() => {});
await sql.end({ timeout: 5 });
}
});
it('append → load round-trips and preserves order across two appends', async () => {
const key = { projectKey, sessionId: 'sess-order' };
await store.append(key, [entry('user', { uuid: 'u1' }), entry('assistant', { uuid: 'a1' })]);
await store.append(key, [entry('result', { uuid: 'r1' })]);
const loaded = await store.load(key);
expect(loaded).not.toBeNull();
expect(loaded!.map((e) => e.uuid)).toEqual(['u1', 'a1', 'r1']);
expect(loaded!.map((e) => e.type)).toEqual(['user', 'assistant', 'result']);
});
it('append with an empty batch is a no-op (load still null for an otherwise-unseen key)', async () => {
const key = { projectKey, sessionId: 'sess-empty' };
await store.append(key, []);
expect(await store.load(key)).toBeNull();
});
it('load of a key that was never written returns null', async () => {
expect(await store.load({ projectKey, sessionId: 'never-seen' })).toBeNull();
});
it('isolates the main transcript from a subpath (load each independently)', async () => {
const sessionId = 'sess-subpath';
const mainKey = { projectKey, sessionId };
const subKey = { projectKey, sessionId, subpath: 'subagents/x' };
await store.append(mainKey, [entry('user', { uuid: 'main-1' })]);
await store.append(subKey, [entry('assistant', { uuid: 'sub-1' })]);
const main = await store.load(mainKey);
const sub = await store.load(subKey);
expect(main!.map((e) => e.uuid)).toEqual(['main-1']);
expect(sub!.map((e) => e.uuid)).toEqual(['sub-1']);
});
it('listSessions returns the session with a numeric mtime (main transcripts only)', async () => {
const sessionId = 'sess-list';
await store.append({ projectKey, sessionId }, [entry('user', { uuid: 'l1' })]);
// A subagent-only session must NOT surface as a main-transcript session.
await store.append(
{ projectKey, sessionId: 'sess-sub-only', subpath: 'subagents/y' },
[entry('user', { uuid: 's1' })],
);
const sessions = await store.listSessions(projectKey);
const ids = sessions.map((s) => s.sessionId);
expect(ids).toContain(sessionId);
expect(ids).not.toContain('sess-sub-only');
const row = sessions.find((s) => s.sessionId === sessionId)!;
expect(typeof row.mtime).toBe('number');
expect(Number.isFinite(row.mtime)).toBe(true);
expect(row.mtime).toBeGreaterThan(0);
});
it('delete with a subpath removes only that subpath', async () => {
const sessionId = 'sess-del-subpath';
const mainKey = { projectKey, sessionId };
const subKey = { projectKey, sessionId, subpath: 'subagents/z' };
await store.append(mainKey, [entry('user', { uuid: 'keep-1' })]);
await store.append(subKey, [entry('assistant', { uuid: 'drop-1' })]);
await store.delete(subKey);
expect(await store.load(subKey)).toBeNull();
expect((await store.load(mainKey))!.map((e) => e.uuid)).toEqual(['keep-1']);
});
it('delete without a subpath removes the whole session (all subpaths)', async () => {
const sessionId = 'sess-del-all';
const mainKey = { projectKey, sessionId };
const subKey = { projectKey, sessionId, subpath: 'subagents/w' };
await store.append(mainKey, [entry('user', { uuid: 'm' })]);
await store.append(subKey, [entry('assistant', { uuid: 's' })]);
await store.delete({ projectKey, sessionId });
expect(await store.load(mainKey)).toBeNull();
expect(await store.load(subKey)).toBeNull();
expect(await store.listSubkeys({ projectKey, sessionId })).toEqual([]);
});
it('listSubkeys returns the distinct non-main subpaths', async () => {
const sessionId = 'sess-subkeys';
await store.append({ projectKey, sessionId }, [entry('user', { uuid: 'main' })]);
await store.append({ projectKey, sessionId, subpath: 'subagents/a' }, [entry('user', { uuid: 'a1' })]);
await store.append({ projectKey, sessionId, subpath: 'subagents/a' }, [entry('user', { uuid: 'a2' })]);
await store.append({ projectKey, sessionId, subpath: 'subagents/b' }, [entry('user', { uuid: 'b1' })]);
const subkeys = await store.listSubkeys({ projectKey, sessionId });
expect(subkeys.sort()).toEqual(['subagents/a', 'subagents/b']);
});
});

View File

@@ -0,0 +1,96 @@
import { describe, it, expect } from 'vitest';
import { createPushable } from '../pushable-iterable.js';
/**
* The pushable async-iterable that feeds the Claude SDK's streaming-input query()
* one message per turn while staying open across turns. Tests cover the ordering
* contract (push/close/async-iterate) without any SDK shape.
*/
describe('createPushable — push/iterate ordering', () => {
it('yields buffered values in FIFO order then parks', async () => {
const p = createPushable<number>();
const it = p.iterable[Symbol.asyncIterator]();
p.push(1);
p.push(2);
expect(await it.next()).toEqual({ value: 1, done: false });
expect(await it.next()).toEqual({ value: 2, done: false });
// No more buffered → next() parks; resolve it by pushing.
const parked = it.next();
p.push(3);
expect(await parked).toEqual({ value: 3, done: false });
});
it('hands a value directly to a parked consumer (push after await)', async () => {
const p = createPushable<string>();
const it = p.iterable[Symbol.asyncIterator]();
const pending = it.next(); // parks immediately (empty buffer)
p.push('hello');
expect(await pending).toEqual({ value: 'hello', done: false });
});
it('close() resolves a parked consumer as done and reports done thereafter', async () => {
const p = createPushable<number>();
const it = p.iterable[Symbol.asyncIterator]();
const pending = it.next();
p.close();
expect(await pending).toEqual({ value: undefined, done: true });
expect(await it.next()).toEqual({ value: undefined, done: true });
expect(p.closed).toBe(true);
});
it('still drains values buffered BEFORE close', async () => {
const p = createPushable<number>();
const it = p.iterable[Symbol.asyncIterator]();
p.push(10);
p.push(20);
p.close();
expect(await it.next()).toEqual({ value: 10, done: false });
expect(await it.next()).toEqual({ value: 20, done: false });
expect(await it.next()).toEqual({ value: undefined, done: true });
});
it('drops values pushed after close', async () => {
const p = createPushable<number>();
const it = p.iterable[Symbol.asyncIterator]();
p.close();
p.push(99); // no-op
expect(await it.next()).toEqual({ value: undefined, done: true });
});
it('close() is idempotent', () => {
const p = createPushable<number>();
p.close();
expect(() => p.close()).not.toThrow();
expect(p.closed).toBe(true);
});
it('works with a for-await loop driven by interleaved pushes', async () => {
const p = createPushable<number>();
const seen: number[] = [];
const consumer = (async () => {
for await (const v of p.iterable) seen.push(v);
})();
p.push(1);
await Promise.resolve();
p.push(2);
await Promise.resolve();
p.close();
await consumer;
expect(seen).toEqual([1, 2]);
});
it('return() on the iterator closes the queue (for-await break)', async () => {
const p = createPushable<number>();
const it = p.iterable[Symbol.asyncIterator]();
p.push(1);
expect(await it.next()).toEqual({ value: 1, done: false });
// Simulate a `break` in for-await: the runtime calls return().
expect(await it.return!()).toEqual({ value: undefined, done: true });
expect(p.closed).toBe(true);
p.push(2); // dropped — queue is closed
expect(await it.next()).toEqual({ value: undefined, done: true });
});
});

View File

@@ -0,0 +1,192 @@
/**
* claude-sdk-sessionstore #9 (Part 2) — PURE Claude-SDK message → AgentEvent mapper.
*
* `ClaudeSdkBackend` drives one `query()` per (chat, agent) session and feeds each
* `SDKMessage` it yields through this function, forwarding the returned
* `AgentEvent[]` to the dispatcher's `onEvent` (which maps them to WS frames +
* persists). Kept PURE (one message + a caller-owned accumulator → events) so it's
* unit-testable without a live `claude` binary — the whole point of Part 2's
* typecheck-and-unit-test gate (the live pump needs a host smoke).
*
* SDK shapes (verified against @anthropic-ai/claude-agent-sdk@0.3.159 sdk.d.ts +
* @anthropic-ai/sdk beta messages d.ts):
* - `SDKPartialAssistantMessage` (`type:'stream_event'`) carries a
* `BetaRawMessageStreamEvent` — the LIVE delta stream (only emitted when
* `options.includePartialMessages` is set, which the backend sets). We map:
* · content_block_delta + text_delta → { text }
* · content_block_delta + thinking_delta → { reasoning }
* · content_block_start + tool_use block → { tool_call } (in_progress)
* · content_block_delta + input_json_delta → buffered into the tool's args
* (no event; the assembled input rides the terminal tool_update)
* - `SDKAssistantMessage` (`type:'assistant'`) carries the FINAL `message.content`
* blocks. Text/thinking there are post-hoc repeats of what the partials already
* streamed, so we DROP them (dedup) and only emit a terminal `tool_update`
* (status completed) per `tool_use` block, with its now-complete `input`.
* - All other `SDKMessage` variants (system/init, status, result, hooks, task
* notifications, …) carry no renderable turn content → return [].
*
* Tool assembly spans messages: a tool_use block opens in a partial
* `content_block_start`, its args stream as `input_json_delta` frames keyed by the
* block `index`, and the final assistant message restates the complete block. The
* caller owns a `ClaudeSdkMapState` (snapshot map + per-index tool tracking) that
* threads this across calls, mirroring the `Map<string, AcpToolSnapshot>` the other
* backends pass into `mapSessionUpdate`. The result frames carry the SAME
* `AcpToolSnapshot` shape, so `persistExternalAgentTurn` / `snapshotToWireToolCall`
* are reused unchanged.
*/
import type { SDKMessage } from '@anthropic-ai/claude-agent-sdk';
import type { AgentEvent } from '../agent-backend.js';
import type { AcpToolSnapshot } from '../acp-tool-snapshot.js';
/**
* The underlying `@anthropic-ai/sdk` Beta message types (`BetaRawMessageStreamEvent`,
* `BetaContentBlock`) are a TRANSITIVE dep of `@anthropic-ai/claude-agent-sdk` — not
* a direct dependency of apps/coder — so a `@anthropic-ai/sdk/...` import does NOT
* resolve here under pnpm's strict node_modules. We instead DERIVE both shapes from
* the SDK's own exported message types, which is also more correct (it tracks the
* exact `event` / `content` shapes the SDK yields, not a hand-picked import path).
*/
type StreamEvent = Extract<SDKMessage, { type: 'stream_event' }>['event'];
type AssistantContent = Extract<SDKMessage, { type: 'assistant' }>['message']['content'];
type ContentBlock = AssistantContent extends readonly (infer B)[] ? B : never;
/**
* Caller-owned accumulator threaded across `mapSdkMessage` calls within ONE turn.
* The backend creates a fresh one per turn and clears it at turn end.
*/
export interface ClaudeSdkMapState {
/** Stable tool-call snapshots by tool_use id, merged across start/delta/stop. */
snapshots: Map<string, AcpToolSnapshot>;
/**
* Partial-stream block index → in-flight tool assembly. Anthropic's stream keys
* blocks by a numeric `index`; tool_use args arrive as `input_json_delta`s under
* that index with no id, so we map index→id to route them and buffer the raw
* JSON fragments until the block closes (or the final assistant message lands).
*/
toolByIndex: Map<number, { id: string; name: string; jsonBuf: string }>;
}
/** Construct a fresh per-turn accumulator. */
export function createClaudeSdkMapState(): ClaudeSdkMapState {
return { snapshots: new Map(), toolByIndex: new Map() };
}
/**
* Map one `SDKMessage` → zero or more `AgentEvent`s, mutating `state` for
* cross-message tool assembly + dedup. Pure w.r.t. its inputs otherwise.
*/
export function mapSdkMessage(msg: SDKMessage, state: ClaudeSdkMapState): AgentEvent[] {
switch (msg.type) {
case 'stream_event':
return mapStreamEvent(msg.event, state);
case 'assistant':
return mapFinalAssistant(msg.message.content, state);
default:
// system/init, status, result, hooks, task_*, etc. — no turn content here.
// (The backend reads session_id off the init message and usage/cost off the
// result message directly; neither produces a renderable AgentEvent.)
return [];
}
}
/** Live partial-stream delta → AgentEvent(s). */
function mapStreamEvent(event: StreamEvent, state: ClaudeSdkMapState): AgentEvent[] {
switch (event.type) {
case 'content_block_start': {
const block = event.content_block;
if (block.type === 'tool_use') {
const snap: AcpToolSnapshot = {
toolCallId: block.id,
title: block.name,
kind: null,
status: 'in_progress',
rawInput: block.input ?? undefined,
rawOutput: undefined,
};
state.snapshots.set(block.id, snap);
state.toolByIndex.set(event.index, { id: block.id, name: block.name, jsonBuf: '' });
return [{ type: 'tool_call', toolCall: snap }];
}
return [];
}
case 'content_block_delta': {
const delta = event.delta;
if (delta.type === 'text_delta') {
return delta.text ? [{ type: 'text', text: delta.text }] : [];
}
if (delta.type === 'thinking_delta') {
return delta.thinking ? [{ type: 'reasoning', text: delta.thinking }] : [];
}
if (delta.type === 'input_json_delta') {
// Buffer the tool's streamed args under its block index; no event yet —
// the assembled input rides the terminal tool_update (or the final block).
const t = state.toolByIndex.get(event.index);
if (t) t.jsonBuf += delta.partial_json ?? '';
return [];
}
// signature_delta / citations_delta / compaction_delta — nothing to render.
return [];
}
case 'content_block_stop': {
// Close out a streamed tool block: parse its buffered JSON args and emit a
// tool_update carrying the assembled input. The final assistant message will
// restate the same block, but its snapshot is dedup-merged (same id) so this
// is harmless — we emit here so a tool's input renders even if the assistant
// message is delayed/dropped.
const t = state.toolByIndex.get(event.index);
if (!t) return [];
state.toolByIndex.delete(event.index);
const prev = state.snapshots.get(t.id);
const snap: AcpToolSnapshot = {
toolCallId: t.id,
title: prev?.title ?? t.name,
kind: null,
status: 'in_progress',
rawInput: parseJsonOr(t.jsonBuf, prev?.rawInput),
rawOutput: undefined,
};
state.snapshots.set(t.id, snap);
return [{ type: 'tool_update', toolCall: snap }];
}
default:
// message_start / message_delta / message_stop — turn framing, no content.
return [];
}
}
/**
* Final assistant message content blocks. Text/thinking are post-hoc repeats of
* the partial stream → dropped (dedup). Only tool_use blocks emit a terminal
* tool_update carrying the complete `input`.
*/
function mapFinalAssistant(content: ContentBlock[], state: ClaudeSdkMapState): AgentEvent[] {
const out: AgentEvent[] = [];
for (const block of content) {
if (block.type === 'tool_use') {
const prev = state.snapshots.get(block.id);
const snap: AcpToolSnapshot = {
toolCallId: block.id,
title: prev?.title ?? block.name,
kind: null,
status: 'completed',
rawInput: block.input ?? prev?.rawInput,
rawOutput: undefined,
};
state.snapshots.set(block.id, snap);
out.push({ type: 'tool_update', toolCall: snap });
}
// text / thinking / redacted_thinking blocks: already streamed via partials.
}
return out;
}
/** Parse a buffered JSON string; fall back to a prior value on empty/invalid. */
function parseJsonOr(buf: string, fallback: unknown): unknown {
const s = buf.trim();
if (!s) return fallback;
try {
return JSON.parse(s);
} catch {
return fallback;
}
}

View File

@@ -0,0 +1,38 @@
/**
* claude-sdk-sessionstore #9 (Part 2) — claude-SDK-vs-PTY routing predicate.
*
* Sibling to `shouldUseWarmBackend` (warm-acp-routing.ts). The warm Claude-SDK
* backend keys its persistent `query()` on (chat_id, agent) — exactly like the
* warm-ACP / opencode-server backends — so a task only routes to it when it carries
* BOTH a `session_id` and a `chat_id` (a real chat tab).
*
* CRUCIALLY this is ALSO gated behind the `CLAUDE_SDK_BACKEND` env flag (default
* OFF). While off — the production default — claude always falls through to the
* existing one-shot PTY `runExternalAgent` path, UNCHANGED. The live SDK streaming
* pump + cross-turn resume need a host smoke against the real `claude` binary, so
* we keep the working PTY path as the default until that lands. Flip the env var
* on a host (any truthy value) to opt a deployment into the SDK backend.
*
* Pure (env read injected) so it's unit-testable; the dispatcher consumes it.
*/
/** True iff the `CLAUDE_SDK_BACKEND` env flag is set to a truthy value. */
export function claudeSdkBackendEnabled(env: NodeJS.ProcessEnv = process.env): boolean {
const v = env.CLAUDE_SDK_BACKEND;
if (v == null) return false;
const s = v.trim().toLowerCase();
return s !== '' && s !== '0' && s !== 'false' && s !== 'off' && s !== 'no';
}
export function shouldUseClaudeSdk(
task: {
agent: string | null;
session_id: string | null;
chat_id: string | null;
},
env: NodeJS.ProcessEnv = process.env,
): boolean {
if (!claudeSdkBackendEnabled(env)) return false;
if (task.agent !== 'claude') return false;
return task.session_id != null && task.chat_id != null;
}

View File

@@ -0,0 +1,364 @@
/**
* claude-sdk-sessionstore #9 (Part 2) — ClaudeSdkBackend.
*
* A warm, resumable backend for the `claude` agent built on the Claude Agent SDK
* (`@anthropic-ai/claude-agent-sdk`), implementing the Phase-0 `AgentBackend`
* contract (same shape as `WarmAcpBackend` / `OpenCodeServerBackend`). One
* persistent `query()` per (chat, agent) session, driven in STREAMING-INPUT mode:
* the `prompt` is a pushable `AsyncIterable<SDKUserMessage>` that stays open across
* turns, so the SDK subprocess + conversation stay warm between `prompt()` calls
* until `closeSession`/`dispose`.
*
* ⚠ LIVE PUMP IS HOST-ONLY. The actual streaming turn needs the real `claude`
* binary + ANTHROPIC auth on a host — it CANNOT run in the dev container. This file
* is written against the REAL SDK types so it TYPECHECKS, and the PURE pieces (the
* `mapSdkMessage` mapper + the `createPushable` queue) are unit-tested. Routing to
* this backend is gated behind `CLAUDE_SDK_BACKEND` (default OFF) so production
* claude stays on the working PTY path until a host smoke validates the pump +
* cross-turn resume.
*
* Lifecycle (mirrors warm-acp.ts / opencode-server.ts):
* - `ensureSession`: resolve the resume id from `agent_sessions(chat_id,'claude')`
* and (re)build the single `query()` if not already live. The SDK's own
* `sessionStore` (Part 1 PostgresSessionStore) materializes the transcript on
* resume; `options.resume` carries the provider session id.
* - `prompt`: push ONE user message onto the open queue, iterate the generator,
* map each `SDKMessage` → `AgentEvent`s via `mapSdkMessage`, forward to
* `ctx.onEvent`, and resolve when the turn's `result` message lands. Capture the
* `session_id` from the `init` message and persist it to `agent_sessions`;
* accumulate `result.usage` / `total_cost_usd` onto the row (mirrors opencode U.6).
* - `closeSession` / `dispose`: close the queue + dispose the query generator.
* - A thrown error or `result.subtype==='error*'` marks `agent_sessions.status='crashed'`.
*
* Turn serialization: like warm-acp, exactly one turn is in flight at a time on a
* given backend (the dispatcher's per-session `inflight` map enforces this upstream;
* `isBusy()` reports it so the pool never evicts mid-turn).
*/
import { query, type Query, type SDKMessage, type SDKUserMessage, type Options } from '@anthropic-ai/claude-agent-sdk';
import type { FastifyBaseLogger } from 'fastify';
import type { Sql } from '../../db.js';
import { PostgresSessionStore } from './claude-session-store.js';
import { createPushable, type Pushable } from './pushable-iterable.js';
import { mapSdkMessage, createClaudeSdkMapState, type ClaudeSdkMapState } from './claude-sdk-map.js';
import type {
AgentBackend,
AgentSessionHandle,
EnsureSessionOpts,
PromptCtx,
TurnResult,
} from '../agent-backend.js';
export interface ClaudeSdkBackendDeps {
sql: Sql;
log: FastifyBaseLogger;
/** The (chat, agent) this backend serves — its pool identity + DB key. */
chatId: string;
/** Always 'claude' today; kept explicit so the pool key + DB writes stay honest. */
agent: string;
/** Resolved `claude` binary path (available_agents.install_path); null → SDK default. */
installPath: string | null;
}
export class ClaudeSdkBackend implements AgentBackend {
readonly backend = 'claude_sdk' as const;
private readonly sql: Sql;
private readonly log: FastifyBaseLogger;
private readonly chatId: string;
private readonly agent: string;
private readonly installPath: string | null;
private readonly sessionStore: PostgresSessionStore;
/** The single persistent query() generator; null until the first turn builds it. */
private query: Query | null = null;
/** The open input queue feeding the generator one SDKUserMessage per turn. */
private input: Pushable<SDKUserMessage> | null = null;
/** The provider's own session id (resume token), captured from the init message. */
private agentSessionId: string | null = null;
/** Resolved model the live query() was built with; a change forces a rebuild. */
private builtModel: string | null = null;
/** True between prompt() start and settle. */
private busy = false;
private up = false;
constructor(deps: ClaudeSdkBackendDeps) {
this.sql = deps.sql;
this.log = deps.log;
this.chatId = deps.chatId;
this.agent = deps.agent;
this.installPath = deps.installPath;
this.sessionStore = new PostgresSessionStore(deps.sql);
}
/** §2: liveness for the health endpoint + dispatcher fallback decision. */
health(): 'up' | 'down' {
return this.up ? 'up' : 'down';
}
/** Phase 3: busy iff a turn is in flight (pool never evicts a busy backend). */
isBusy(): boolean {
return this.busy;
}
// ─── ensureSession: resolve resume id + (re)build the warm query ──────────────
async ensureSession(sessionId: string, opts: EnsureSessionOpts): Promise<AgentSessionHandle> {
// Resolve the resume token from the (chat_id, agent) row. A crashed row is not
// resumed (the SDK would fail to load a dead session); we create fresh.
const [row] = await this.sql<{ agent_session_id: string | null; status: string }[]>`
SELECT agent_session_id, status FROM agent_sessions
WHERE chat_id = ${opts.chatId} AND agent = ${opts.agent}
`;
const resumeId = row && row.status !== 'crashed' ? row.agent_session_id : null;
// (Re)build the warm query if there is none, or the model changed (the SDK can
// change model mid-session via setModel, but a fresh build is simplest + matches
// opencode's config-drift → fresh-session rule). The query stays alive across
// turns; only closeSession/dispose tears it down.
if (!this.query || this.builtModel !== opts.model) {
await this.teardownQuery();
this.buildQuery(opts.worktreePath, opts.model, resumeId);
}
// Seed the in-memory resume id from the DB so a handle built before the first
// turn's init message still carries the last-known token. The init message
// overwrites it with the authoritative current id during the turn.
if (this.agentSessionId == null) this.agentSessionId = resumeId;
// Upsert the agent_sessions row (backend='claude_sdk'). agent_session_id may be
// null until the first turn captures it from the init message; prompt() updates it.
await this.sql`
INSERT INTO agent_sessions
(chat_id, session_id, worktree_id, agent, backend, agent_session_id, server_port, status, last_active_at)
VALUES
(${opts.chatId}, ${sessionId}, ${opts.worktreeId}, ${opts.agent}, 'claude_sdk', ${this.agentSessionId}, NULL, 'active', clock_timestamp())
ON CONFLICT (chat_id, agent) DO UPDATE SET
session_id = EXCLUDED.session_id,
worktree_id = EXCLUDED.worktree_id,
backend = 'claude_sdk',
agent_session_id = COALESCE(EXCLUDED.agent_session_id, agent_sessions.agent_session_id),
server_port = NULL,
status = 'active',
last_active_at = clock_timestamp()
`.catch((err) => {
this.log.warn({ err: errMsg(err), chatId: opts.chatId, agent: opts.agent }, 'claude-sdk: agent_sessions upsert failed (non-fatal)');
});
return {
sessionId,
agent: opts.agent,
backend: 'claude_sdk',
chatId: opts.chatId,
worktreeId: opts.worktreeId,
agentSessionId: this.agentSessionId,
serverPort: null,
};
}
/** Build the persistent query() in streaming-input mode. Lazy — no subprocess
* work happens until the generator is first iterated in prompt(). */
private buildQuery(worktreePath: string, model: string, resumeId: string | null): void {
const input = createPushable<SDKUserMessage>();
const options: Options = {
sessionStore: this.sessionStore,
cwd: worktreePath,
// Stream partial assistant messages so text/thinking/tool deltas arrive live
// (the mapper reads them; without this only terminal messages land).
includePartialMessages: true,
...(model ? { model } : {}),
...(resumeId ? { resume: resumeId } : {}),
...(this.installPath ? { pathToClaudeCodeExecutable: this.installPath } : {}),
// ANTHROPIC auth/env must reach the child; inherit the process env (host concern).
env: process.env as Record<string, string>,
};
this.input = input;
this.query = query({ prompt: input.iterable, options });
this.builtModel = model;
this.up = true;
this.log.info({ chatId: this.chatId, agent: this.agent, model, resume: resumeId ?? null }, 'claude-sdk: warm query built');
}
// ─── prompt: push one user message + drain the generator until result ─────────
async prompt(handle: AgentSessionHandle, input: string, ctx: PromptCtx): Promise<TurnResult> {
if (!this.query || !this.input) {
// ensureSession should have built it; rebuild defensively (e.g. evicted/raced).
this.buildQuery(ctx.worktreePath, ctx.model, handle.agentSessionId);
}
const gen = this.query!;
const queue = this.input!;
if (ctx.signal.aborted) return { ok: false, error: 'aborted' };
this.busy = true;
const state: ClaudeSdkMapState = createClaudeSdkMapState();
// Per-turn abort: interrupt the in-flight query on the SAME generator (never
// tear down the warm query — that's the pool's lifetime). The generator then
// emits its terminal result and the drain loop exits.
let aborted = false;
const onAbort = () => {
if (aborted) return;
aborted = true;
void gen.interrupt().catch(() => {});
};
ctx.signal.addEventListener('abort', onAbort, { once: true });
// Push the turn's user message onto the open queue. session_id is optional on
// the wire; the SDK manages it via resume + the init message.
const userMsg: SDKUserMessage = {
type: 'user',
message: { role: 'user', content: input },
parent_tool_use_id: null,
...(handle.agentSessionId ? { session_id: handle.agentSessionId } : {}),
};
queue.push(userMsg);
try {
for await (const msg of gen) {
// Capture the provider session id from the init message (authoritative).
if (msg.type === 'system' && msg.subtype === 'init' && msg.session_id) {
if (this.agentSessionId !== msg.session_id) {
this.agentSessionId = msg.session_id;
await this.persistAgentSessionId(msg.session_id);
}
}
// The result message ends THIS turn (it does not close the generator —
// streaming-input keeps it alive for the next pushed message).
if (msg.type === 'result') {
await this.accumulateUsage(msg);
const ok = msg.subtype === 'success' && !aborted;
if (!ok) {
// error_during_execution / error_max_turns / aborted → crashed row.
await this.markCrashed();
} else {
await this.markIdle();
}
if (aborted) return { ok: false, error: 'aborted' };
return ok
? { ok: true }
: { ok: false, error: resultErrorMessage(msg) };
}
// Map renderable content → AgentEvents for the dispatcher's onEvent.
for (const ev of mapSdkMessage(msg, state)) {
ctx.onEvent(ev);
}
}
// Generator ended without a result message (e.g. it was disposed) — treat as
// a non-fatal incomplete turn so the dispatcher still finalizes the row.
if (aborted) return { ok: false, error: 'aborted' };
return { ok: false, error: 'claude-sdk: query ended before result' };
} catch (err) {
if (aborted) return { ok: false, error: 'aborted' };
await this.markCrashed();
return { ok: false, error: errMsg(err) };
} finally {
ctx.signal.removeEventListener('abort', onAbort);
this.busy = false;
}
}
// ─── persistence helpers ──────────────────────────────────────────────────────
private async persistAgentSessionId(id: string): Promise<void> {
await this.sql`
UPDATE agent_sessions
SET agent_session_id = ${id}, last_active_at = clock_timestamp()
WHERE chat_id = ${this.chatId} AND agent = ${this.agent}
`.catch((err) => {
this.log.warn({ err: errMsg(err), chatId: this.chatId }, 'claude-sdk: failed to persist agent_session_id (non-fatal)');
});
}
/**
* Accumulate the turn's usage/cost onto the (chat_id, agent) row — mirrors the
* opencode U.6 running-total pattern. The SDK reports usage once per turn on the
* result message (not per step), so this fires once per prompt(). Cache read/write
* input tokens fold into `input_tokens`; usage telemetry never fails a turn.
*/
private async accumulateUsage(result: Extract<SDKMessage, { type: 'result' }>): Promise<void> {
const u = result.usage;
const input = num(u?.input_tokens) + num(u?.cache_read_input_tokens) + num(u?.cache_creation_input_tokens);
const output = num(u?.output_tokens);
const cost = numF(result.total_cost_usd);
if (input === 0 && output === 0 && cost === 0) return;
await this.sql`
UPDATE agent_sessions SET
input_tokens = input_tokens + ${input},
output_tokens = output_tokens + ${output},
cost = cost + ${cost}
WHERE chat_id = ${this.chatId} AND agent = ${this.agent}
`.catch((err) => {
this.log.warn({ err: errMsg(err), chatId: this.chatId }, 'claude-sdk: failed to persist usage (non-fatal)');
});
}
private async markIdle(): Promise<void> {
await this.sql`
UPDATE agent_sessions SET status = 'idle', last_active_at = clock_timestamp()
WHERE chat_id = ${this.chatId} AND agent = ${this.agent}
`.catch(() => {});
}
private async markCrashed(): Promise<void> {
await this.sql`
UPDATE agent_sessions SET status = 'crashed'
WHERE chat_id = ${this.chatId} AND agent = ${this.agent}
`.catch(() => {});
}
// ─── teardown ────────────────────────────────────────────────────────────────
async closeSession(handle: AgentSessionHandle): Promise<void> {
await this.teardownQuery();
await this.sql`
UPDATE agent_sessions SET status = 'closed'
WHERE chat_id = ${handle.chatId} AND agent = ${handle.agent}
`.catch(() => {});
}
async dispose(): Promise<void> {
await this.teardownQuery();
}
/** Close the input queue + dispose the generator. Idempotent. */
private async teardownQuery(): Promise<void> {
this.up = false;
this.busy = false;
const q = this.query;
const queue = this.input;
this.query = null;
this.input = null;
this.builtModel = null;
queue?.close();
if (q) {
// return() ends the AsyncGenerator and lets the SDK clean up its subprocess.
await q.return(undefined).catch(() => {});
}
}
}
// ─── helpers ──────────────────────────────────────────────────────────────────
/** Coerce to a non-negative finite integer (tokens). */
function num(v: unknown): number {
const x = typeof v === 'number' ? v : Number(v);
return Number.isFinite(x) && x > 0 ? Math.round(x) : 0;
}
/** Coerce to a non-negative finite float (cost USD). */
function numF(v: unknown): number {
const x = typeof v === 'number' ? v : Number(v);
return Number.isFinite(x) && x > 0 ? x : 0;
}
/** Build a human-readable error from an SDK error-result message. */
function resultErrorMessage(result: Extract<SDKMessage, { type: 'result' }>): string {
if (result.subtype === 'success') return 'ok';
const errs = (result as { errors?: string[] }).errors;
if (Array.isArray(errs) && errs.length > 0) return `${result.subtype}: ${errs.join('; ')}`;
return result.subtype;
}
function errMsg(e: unknown): string {
return e instanceof Error ? e.message : String(e);
}

View File

@@ -0,0 +1,117 @@
import type { SessionStore, SessionKey, SessionStoreEntry } from '@anthropic-ai/claude-agent-sdk';
import type { Sql } from '../../db.js';
/**
* claude-sdk-sessionstore #9 (Part 1) — clean-room PostgresSessionStore.
*
* A Postgres-backed implementation of the Claude Agent SDK's `SessionStore`
* adapter type. The SDK mirrors each transcript line (a JSON-safe POJO with a
* `type` discriminant) to this store via `append`; on resume it calls `load`
* to materialize the full transcript back. We treat entries as opaque blobs and
* preserve append order via a BIGSERIAL `id` — `load` replays `ORDER BY id`.
*
* Storage shape: one row per entry in `claude_session_entries`, keyed by the
* SDK's `SessionKey` (project_key, session_id, subpath). The SDK uses an
* *undefined* subpath for the main transcript and disallows the empty string;
* we collapse `undefined → ''` so the main transcript and subagent files share
* one table, distinguished by the `subpath` column (`'' = main`).
*
* Clean-room: written against the SDK's published `SessionStore` type contract
* and BooCode's existing SQL conventions (porsager tagged templates, `sql.json`
* for JSONB). No SDK example/reference code was consulted.
*/
export class PostgresSessionStore implements SessionStore {
constructor(private readonly sql: Sql) {}
/**
* Mirror a batch of transcript entries. No-op on an empty batch; otherwise a
* single multi-row INSERT writes them in array order. Because `id` is a
* monotonically-increasing BIGSERIAL, the insert order is the replay order
* `load` reconstructs — entries within one call land in the order given.
*/
async append(key: SessionKey, entries: SessionStoreEntry[]): Promise<void> {
if (entries.length === 0) return;
const subpath = key.subpath ?? '';
const rows = entries.map((entry) => ({
project_key: key.projectKey,
session_id: key.sessionId,
subpath,
entry: this.sql.json(entry as never),
}));
await this.sql`
INSERT INTO claude_session_entries ${this.sql(rows, 'project_key', 'session_id', 'subpath', 'entry')}
`;
}
/**
* Load a full transcript for resume. Returns the entries in append order, or
* `null` for a (project_key, session_id, subpath) key that was never written.
*/
async load(key: SessionKey): Promise<SessionStoreEntry[] | null> {
const subpath = key.subpath ?? '';
const rows = await this.sql<{ entry: SessionStoreEntry }[]>`
SELECT entry
FROM claude_session_entries
WHERE project_key = ${key.projectKey}
AND session_id = ${key.sessionId}
AND subpath = ${subpath}
ORDER BY id
`;
if (rows.length === 0) return null;
return rows.map((r) => r.entry);
}
/**
* List the main transcripts for a project. `mtime` is the storage write time
* (latest `created_at` for the session) in Unix epoch milliseconds; the SDK
* sorts the result by mtime descending.
*/
async listSessions(projectKey: string): Promise<Array<{ sessionId: string; mtime: number }>> {
const rows = await this.sql<{ session_id: string; mtime: string }[]>`
SELECT session_id, extract(epoch FROM max(created_at)) * 1000 AS mtime
FROM claude_session_entries
WHERE project_key = ${projectKey}
AND subpath = ''
GROUP BY session_id
`;
return rows.map((r) => ({ sessionId: r.session_id, mtime: Number(r.mtime) }));
}
/**
* Delete a session. With a `subpath` set, only that subpath's rows are
* removed; with `subpath` omitted, every row for the session is removed
* (all subpaths, including the main transcript).
*/
async delete(key: SessionKey): Promise<void> {
if (key.subpath !== undefined) {
await this.sql`
DELETE FROM claude_session_entries
WHERE project_key = ${key.projectKey}
AND session_id = ${key.sessionId}
AND subpath = ${key.subpath}
`;
return;
}
await this.sql`
DELETE FROM claude_session_entries
WHERE project_key = ${key.projectKey}
AND session_id = ${key.sessionId}
`;
}
/**
* List the distinct non-main subpaths under a session (e.g. subagent files).
* Used during resume to discover and materialize subagent transcripts; the
* main transcript (`subpath = ''`) is excluded.
*/
async listSubkeys(key: { projectKey: string; sessionId: string }): Promise<string[]> {
const rows = await this.sql<{ subpath: string }[]>`
SELECT DISTINCT subpath
FROM claude_session_entries
WHERE project_key = ${key.projectKey}
AND session_id = ${key.sessionId}
AND subpath <> ''
`;
return rows.map((r) => r.subpath);
}
}

View File

@@ -0,0 +1,96 @@
/**
* claude-sdk-sessionstore #9 (Part 2) — a tiny PURE pushable async-iterable.
*
* The Claude Agent SDK's streaming-input mode wants `query({ prompt })` where
* `prompt` is an `AsyncIterable<SDKUserMessage>`. To keep ONE `query()` generator
* alive across many turns (the "warm" property), the backend feeds it ONE user
* message per `prompt()` turn through a queue that stays open between turns and is
* only closed at `closeSession`/`dispose`. This is that queue.
*
* Semantics (the bit worth unit-testing — push/close/iterate ordering):
* - `push(v)` enqueues a value. If a consumer is parked in `await next()`, it's
* handed the value immediately; otherwise the value buffers in FIFO order.
* - The async iterator yields buffered/pushed values in push order, and PARKS
* (never busy-loops) when the buffer is empty — so the SDK generator waits for
* the next turn's message instead of seeing end-of-input.
* - `close()` ends the iterable: any parked consumer resolves `{done:true}` and
* all future `next()`s return done. Values pushed after close are dropped.
* - It's single-consumer (one `query()` reads it); concurrent consumers are not a
* supported shape and not needed here.
*
* No SDK import — generic over the pushed value `T` — so the pure push/close/iterate
* ordering is testable without the `SDKUserMessage` shape or a live binary.
*/
export interface Pushable<T> {
/** Enqueue a value (or hand it to a parked consumer). No-op after close. */
push(value: T): void;
/** End the iterable. Idempotent; a parked consumer resolves done. */
close(): void;
/** True once `close()` has been called. */
readonly closed: boolean;
/** The async-iterable the consumer (the SDK `query`) drives. */
readonly iterable: AsyncIterable<T>;
}
export function createPushable<T>(): Pushable<T> {
const buffer: T[] = [];
// A waiting consumer's resolver (null when none is parked). Single-consumer.
let pendingResolve: ((res: IteratorResult<T>) => void) | null = null;
let closed = false;
function push(value: T): void {
if (closed) return;
if (pendingResolve) {
const resolve = pendingResolve;
pendingResolve = null;
resolve({ value, done: false });
return;
}
buffer.push(value);
}
function close(): void {
if (closed) return;
closed = true;
if (pendingResolve) {
const resolve = pendingResolve;
pendingResolve = null;
resolve({ value: undefined, done: true });
}
}
const iterator: AsyncIterator<T> = {
next(): Promise<IteratorResult<T>> {
// Drain the buffer first (FIFO), regardless of close — buffered values
// pushed before close are still delivered.
if (buffer.length > 0) {
return Promise.resolve({ value: buffer.shift() as T, done: false });
}
if (closed) {
return Promise.resolve({ value: undefined, done: true });
}
// Park until the next push/close. Single-consumer: only one waiter at a time.
return new Promise<IteratorResult<T>>((resolve) => {
pendingResolve = resolve;
});
},
return(): Promise<IteratorResult<T>> {
// Consumer abandoned the loop (e.g. `break`) → close so a later push no-ops.
close();
return Promise.resolve({ value: undefined, done: true });
},
};
return {
push,
close,
get closed() {
return closed;
},
iterable: {
[Symbol.asyncIterator]() {
return iterator;
},
},
};
}

View File

@@ -0,0 +1,306 @@
/**
* write-edit-robustness #4 — worktree checkpoints.
*
* External agents (opencode / goose / qwen / claude) write DIRECTLY into the
* shared session worktree (`/tmp/booworktrees/sess-<id>`); BooCode's own `rewind`
* only reverses `pending_changes` against the project root, so it has zero coverage
* there. A checkpoint is a pre-turn shadow-commit of the worktree tree (tracked +
* untracked) captured WITHOUT touching the real index/working tree, stored in a
* private GC-safe ref. `restoreCheckpoint` rewinds the worktree to that commit,
* trims the transcript from the anchor message forward, and resets the agent
* backend so the next turn re-establishes a fresh context consistent with the
* restored files.
*
* All git goes through hostExec + shellEscape (BooCoder runs on the host; the
* worktrees live on the host fs). Checkpoint CREATION is best-effort: a failure
* logs and returns null — it must NEVER throw into the dispatch turn.
*/
import { randomUUID } from 'node:crypto';
import type { FastifyBaseLogger } from 'fastify';
import type { Sql } from '../db.js';
import { hostExec } from './host-exec.js';
import { agentPool, OPENCODE_POOL_KEY } from './agent-pool.js';
import type { AgentSessionHandle } from './agent-backend.js';
/** Minimal shell escape for paths/refs (single-quote wrapping). Mirrors worktrees.ts. */
function shellEscape(s: string): string {
return "'" + s.replace(/'/g, "'\\''") + "'";
}
/**
* Pure builder for the shadow-commit command. Captures tracked + untracked files
* in the worktree into a temp index (so the real index/working tree is untouched),
* writes a tree, commits it parented on HEAD, and parks the commit under a private
* ref `refs/boocode/checkpoints/<id>` so git's GC never reclaims it. Prints ONLY
* the resulting SHA on stdout (the trailing `printf '%s'`), so the caller parses
* stdout.trim() directly.
*
* `id` is the row UUID (minted before the ref so the ref name matches the row).
* Both the worktree path and the id are shell-escaped.
*/
export function buildShadowCommitCommand(worktreePath: string, id: string): string {
const wt = shellEscape(worktreePath);
const ref = shellEscape(`refs/boocode/checkpoints/${id}`);
return (
`cd ${wt} && TMP=$(mktemp) && GIT_INDEX_FILE="$TMP" git read-tree HEAD ` +
`&& GIT_INDEX_FILE="$TMP" git add -A ` +
`&& TREE=$(GIT_INDEX_FILE="$TMP" git write-tree) ` +
`&& SHA=$(git commit-tree "$TREE" -p HEAD -m "boocode checkpoint") ` +
`&& git update-ref ${ref} "$SHA" && rm -f "$TMP" && printf '%s' "$SHA"`
);
}
export interface CreateCheckpointArgs {
chatId: string;
sessionId: string | null;
worktreeId: string | null;
worktreePath: string;
messageId: string | null;
label?: string | null;
}
/**
* Capture a pre-turn checkpoint of the session worktree. Best-effort: returns the
* inserted row's { id, commit_sha } on success, or null on any failure (the turn
* proceeds either way — a missing checkpoint just means no restore point for that
* turn). NEVER throws.
*
* The id is minted up front so the git ref name (`refs/boocode/checkpoints/<id>`)
* matches the DB row id, keeping ref and row in lockstep.
*/
export async function createCheckpoint(
sql: Sql,
args: CreateCheckpointArgs,
opts?: { signal?: AbortSignal; log?: FastifyBaseLogger },
): Promise<{ id: string; commit_sha: string } | null> {
const id = randomUUID();
try {
const cmd = buildShadowCommitCommand(args.worktreePath, id);
const res = await hostExec(cmd, { signal: opts?.signal, timeoutMs: 30_000 });
if (res.exitCode !== 0) {
opts?.log?.warn(
{ chatId: args.chatId, worktreePath: args.worktreePath, stderr: res.stderr.trim().slice(0, 500) },
'checkpoint: shadow-commit failed (turn proceeds without a checkpoint)',
);
return null;
}
const commitSha = res.stdout.trim();
if (!commitSha) {
opts?.log?.warn(
{ chatId: args.chatId, worktreePath: args.worktreePath },
'checkpoint: shadow-commit produced no SHA (turn proceeds)',
);
return null;
}
await sql`
INSERT INTO checkpoints (id, chat_id, session_id, worktree_id, message_id, commit_sha, label)
VALUES (${id}, ${args.chatId}, ${args.sessionId}, ${args.worktreeId}, ${args.messageId}, ${commitSha}, ${args.label ?? null})
`;
opts?.log?.info({ checkpointId: id, chatId: args.chatId, commitSha }, 'checkpoint: created');
return { id, commit_sha: commitSha };
} catch (err) {
opts?.log?.warn(
{ chatId: args.chatId, err: err instanceof Error ? err.message : String(err) },
'checkpoint: create threw (turn proceeds without a checkpoint)',
);
return null;
}
}
/** Error the route maps to a 404 when the checkpoint can't be resolved / scoped. */
export class CheckpointNotFoundError extends Error {
constructor(message: string) {
super(message);
this.name = 'CheckpointNotFoundError';
}
}
export interface RestoreCheckpointResult {
checkpoint_id: string;
messages_deleted: number;
worktree_reset: boolean;
backend_reset: boolean;
}
export interface RestoreCheckpointOpts {
signal?: AbortSignal;
log?: FastifyBaseLogger;
/** If set, the checkpoint MUST belong to this session (route scope guard). */
sessionId?: string;
}
interface CheckpointRow {
id: string;
chat_id: string;
session_id: string | null;
worktree_id: string | null;
message_id: string | null;
commit_sha: string;
created_at: Date;
}
/**
* Restore a checkpoint: rewind its worktree to the shadow commit, trim the
* transcript from the anchor message forward, reset the backend session, and drop
* now-orphaned later checkpoints. Throws CheckpointNotFoundError when the
* checkpoint is missing or not in the requested session (route → 404).
*/
export async function restoreCheckpoint(
sql: Sql,
checkpointId: string,
opts?: RestoreCheckpointOpts,
): Promise<RestoreCheckpointResult> {
// 1. Resolve the checkpoint.
const [cp] = await sql<CheckpointRow[]>`
SELECT id, chat_id, session_id, worktree_id, message_id, commit_sha, created_at
FROM checkpoints WHERE id = ${checkpointId}
`;
if (!cp) {
throw new CheckpointNotFoundError('checkpoint not found');
}
// Authorization scope (fail-safe): the checkpoint's chat must belong to the
// requested session. cp.session_id is a denormalized hint that may be null, so
// gating on it directly fails open — resolve the owning session via chats
// (authoritative; chat_id is NOT NULL) and deny on any mismatch or missing row.
if (opts?.sessionId) {
const [owner] = await sql<{ session_id: string | null }[]>`
SELECT session_id FROM chats WHERE id = ${cp.chat_id}
`;
if (!owner || owner.session_id !== opts.sessionId) {
throw new CheckpointNotFoundError('checkpoint not in session');
}
}
// 2. Resolve the worktree path (by worktree_id, else the session's active one).
let worktreePath: string | null = null;
if (cp.worktree_id) {
const [wt] = await sql<{ path: string }[]>`
SELECT path FROM worktrees WHERE id = ${cp.worktree_id}
`;
worktreePath = wt?.path ?? null;
}
if (!worktreePath) {
const sid = cp.session_id ?? opts?.sessionId ?? null;
if (sid) {
const [wt] = await sql<{ path: string }[]>`
SELECT path FROM worktrees WHERE session_id = ${sid} AND status = 'active' LIMIT 1
`;
worktreePath = wt?.path ?? null;
}
}
// 3. Worktree reset — hard-reset to the shadow commit, then clean untracked.
let worktreeReset = false;
if (worktreePath) {
const resetRes = await hostExec(
`git -C ${shellEscape(worktreePath)} reset --hard ${shellEscape(cp.commit_sha)}`,
{ signal: opts?.signal, timeoutMs: 30_000 },
).catch((err) => {
opts?.log?.warn(
{ checkpointId, err: err instanceof Error ? err.message : String(err) },
'checkpoint restore: reset --hard threw',
);
return null;
});
if (resetRes && resetRes.exitCode === 0) {
const cleanRes = await hostExec(
`git -C ${shellEscape(worktreePath)} clean -fd`,
{ signal: opts?.signal, timeoutMs: 30_000 },
).catch(() => null);
worktreeReset = cleanRes != null && cleanRes.exitCode === 0;
if (!worktreeReset) {
opts?.log?.warn({ checkpointId, worktreePath }, 'checkpoint restore: clean -fd did not succeed');
}
} else {
opts?.log?.warn(
{ checkpointId, worktreePath, stderr: resetRes?.stderr?.trim()?.slice(0, 500) },
'checkpoint restore: reset --hard did not succeed',
);
}
} else {
opts?.log?.warn({ checkpointId }, 'checkpoint restore: no worktree path resolved (files not reset)');
}
// 4. Trim the transcript from the anchor message forward. message_parts FK to
// messages is ON DELETE CASCADE (apps/server schema.sql:49), so parts are
// removed with their messages — no explicit parts delete needed.
let messagesDeleted = 0;
if (cp.message_id) {
const deleted = await sql<{ id: string }[]>`
DELETE FROM messages
WHERE chat_id = ${cp.chat_id}
AND created_at >= (SELECT created_at FROM messages WHERE id = ${cp.message_id})
RETURNING id
`;
messagesDeleted = deleted.length;
}
// 5. Backend reset — mark the chat's agent sessions crashed so the next turn
// re-establishes a fresh backend, and evict the live pool session(s) for this
// (chat, agent). Warm backends hold context server-side with no partial
// rewind, so a full reset is the only consistent option (proposal §4).
const agentRows = await sql<{ agent: string; backend: string; agent_session_id: string | null; session_id: string | null; worktree_id: string | null }[]>`
SELECT agent, backend, agent_session_id, session_id, worktree_id
FROM agent_sessions WHERE chat_id = ${cp.chat_id}
`;
await sql`
UPDATE agent_sessions SET status = 'crashed' WHERE chat_id = ${cp.chat_id}
`.catch(() => {});
let backendReset = false;
try {
// opencode runs on the SHARED server (keyed on a sentinel, not the chat) — close
// just this chat's session(s) on it, mirroring the lifecycle close-hook.
const ocBackend = agentPool.peek(OPENCODE_POOL_KEY, 'opencode');
if (ocBackend) {
for (const row of agentRows) {
if (row.backend !== 'opencode_server' || !row.agent_session_id) continue;
const handle: AgentSessionHandle = {
sessionId: row.session_id ?? '',
agent: row.agent,
backend: 'opencode_server',
chatId: cp.chat_id,
worktreeId: row.worktree_id ?? '',
agentSessionId: row.agent_session_id,
serverPort: null,
};
await ocBackend.closeSession(handle).catch((err) => {
opts?.log?.warn(
{ checkpointId, err: err instanceof Error ? err.message : String(err) },
'checkpoint restore: opencode closeSession threw',
);
});
}
}
// Warm-ACP backends are pooled under the chat id — dispose them (kills the
// goose/qwen child). closeChat skips busy backends (a live turn isn't torn down).
const disposed = await agentPool.closeChat(cp.chat_id);
backendReset = true;
opts?.log?.info({ checkpointId, chatId: cp.chat_id, disposed }, 'checkpoint restore: backend reset');
} catch (err) {
opts?.log?.warn(
{ checkpointId, err: err instanceof Error ? err.message : String(err) },
'checkpoint restore: backend reset threw',
);
}
// 6. Drop now-orphaned later checkpoints for this chat (their anchor messages were
// just trimmed). Compare `created_at` SERVER-SIDE via a subquery (NOT the JS
// Date round-trip, which truncates the stored microsecond precision to ms and
// would make this checkpoint delete ITSELF), and exclude this checkpoint's own
// id so it always survives — letting the user re-restore to it.
await sql`
DELETE FROM checkpoints
WHERE chat_id = ${cp.chat_id}
AND id <> ${cp.id}
AND created_at > (SELECT created_at FROM checkpoints WHERE id = ${cp.id})
`.catch(() => {});
return {
checkpoint_id: checkpointId,
messages_deleted: messagesDeleted,
worktree_reset: worktreeReset,
backend_reset: backendReset,
};
}

View File

@@ -4,6 +4,7 @@ import type { Broker } from '@boocode/server/broker';
import type { WsFrame } from '@boocode/server/ws-frames'; import type { WsFrame } from '@boocode/server/ws-frames';
import type { Config } from '../config.js'; import type { Config } from '../config.js';
import { createWorktree, diffWorktree, cleanupWorktree, ensureSessionWorktree } from './worktrees.js'; import { createWorktree, diffWorktree, cleanupWorktree, ensureSessionWorktree } from './worktrees.js';
import { createCheckpoint } from './checkpoints.js';
import { makeDcpStreamStripper } from './dcp-strip.js'; import { makeDcpStreamStripper } from './dcp-strip.js';
import { dispatchViaAcp } from './acp-dispatch.js'; import { dispatchViaAcp } from './acp-dispatch.js';
import { getResolvedRegistry } from './provider-config-registry.js'; import { getResolvedRegistry } from './provider-config-registry.js';
@@ -15,8 +16,12 @@ import { snapshotToWireToolCall, type AcpToolSnapshot } from './acp-tool-snapsho
import { agentPool, OPENCODE_POOL_KEY } from './agent-pool.js'; import { agentPool, OPENCODE_POOL_KEY } from './agent-pool.js';
import { OpenCodeServerBackend } from './backends/opencode-server.js'; import { OpenCodeServerBackend } from './backends/opencode-server.js';
import { WarmAcpBackend } from './backends/warm-acp.js'; import { WarmAcpBackend } from './backends/warm-acp.js';
import { ClaudeSdkBackend } from './backends/claude-sdk.js';
import { shouldUseWarmBackend } from './backends/warm-acp-routing.js'; import { shouldUseWarmBackend } from './backends/warm-acp-routing.js';
import { shouldUseClaudeSdk } from './backends/claude-sdk-routing.js';
import type { AgentBackend, AgentEvent } from './agent-backend.js'; import type { AgentBackend, AgentEvent } from './agent-backend.js';
import { publishAgentStatus } from './agent-status-publish.js';
import type { AgentStatus } from './normalize-agent-status.js';
interface InferenceRunner { interface InferenceRunner {
enqueue: (sessionId: string, chatId: string, assistantId: string, user: string) => void; enqueue: (sessionId: string, chatId: string, assistantId: string, user: string) => void;
@@ -63,6 +68,21 @@ export function createDispatcher(deps: Deps): { start(): void; stop(): Promise<v
return task.session_id ?? `task:${task.id}`; return task.session_id ?? `task:${task.id}`;
} }
// agent-status-normalize (#10): publish a normalized per-(chat,agent) status on
// the session channel. Every external-agent path (warm-acp / opencode / claude-sdk /
// pty one-shot) reports `working` at turn start, `idle` on clean completion, and
// `error` on the failure path through this single helper so the four paths stay
// DRY and consistent. Best-effort — publishAgentStatus never throws.
function emitAgentStatus(
sessionId: string,
chatId: string,
agent: string,
status: AgentStatus,
reason: string,
): void {
publishAgentStatus(broker.publishFrame, sessionId, chatId, agent, status, reason);
}
async function poll(): Promise<void> { async function poll(): Promise<void> {
// `polling` serializes poll() execution itself (timer + NOTIFY can fire // `polling` serializes poll() execution itself (timer + NOTIFY can fire
// concurrently) so we never double-select a task. It does NOT serialize task // concurrently) so we never double-select a task. It does NOT serialize task
@@ -130,6 +150,12 @@ export function createDispatcher(deps: Deps): { start(): void; stop(): Promise<v
// existing one-shot worktree-per-task ACP/PTY path untouched. // existing one-shot worktree-per-task ACP/PTY path untouched.
if (task.agent === 'opencode') { if (task.agent === 'opencode') {
await runOpenCodeServerTask(task, agentRow.install_path); await runOpenCodeServerTask(task, agentRow.install_path);
} else if (shouldUseClaudeSdk(task)) {
// claude-sdk-sessionstore #9 (Part 2): env-flagged (CLAUDE_SDK_BACKEND, default
// OFF) warm Claude-SDK backend for chat-tab claude tasks. When the flag is off
// (production default) this predicate returns false and claude falls through to
// the UNCHANGED one-shot PTY runExternalAgent path below.
await runClaudeSdkTask(task, agentRow.install_path);
} else if (shouldUseWarmBackend(task)) { } else if (shouldUseWarmBackend(task)) {
await runWarmAcpTask(task, agentRow.install_path); await runWarmAcpTask(task, agentRow.install_path);
} else { } else {
@@ -289,6 +315,11 @@ export function createDispatcher(deps: Deps): { start(): void; stop(): Promise<v
// Create an abort controller for this task // Create an abort controller for this task
const ac = new AbortController(); const ac = new AbortController();
// #10: hoisted above the try so the catch block can report `error` status with
// the (chat, agent) key. Empty until resolved below; guarded before use.
let sessionId = '';
let chatId = '';
try { try {
// Mark running // Mark running
await sql` await sql`
@@ -297,9 +328,6 @@ export function createDispatcher(deps: Deps): { start(): void; stop(): Promise<v
WHERE id = ${taskId} WHERE id = ${taskId}
`; `;
let sessionId: string;
let chatId: string;
if (task.session_id) { if (task.session_id) {
sessionId = task.session_id; sessionId = task.session_id;
const chats = await sql<{ id: string }[]>` const chats = await sql<{ id: string }[]>`
@@ -358,6 +386,16 @@ export function createDispatcher(deps: Deps): { start(): void; stop(): Promise<v
`; `;
const assistantId = assistantMsg!.id; const assistantId = assistantMsg!.id;
// write-edit-robustness #4: pre-turn worktree checkpoint (best-effort; a
// failure logs and never breaks dispatch). This path uses a per-task worktree
// (createWorktree, not the session worktree), so there's no worktrees-table id
// — pass null for worktreeId, the path is enough for restore's reset.
await createCheckpoint(
sql,
{ chatId, sessionId, worktreeId: null, worktreePath, messageId: assistantId },
{ signal: ac.signal, log },
).catch(() => null);
broker.publishFrame(sessionId, { broker.publishFrame(sessionId, {
type: 'message_started', type: 'message_started',
message_id: assistantId, message_id: assistantId,
@@ -365,6 +403,9 @@ export function createDispatcher(deps: Deps): { start(): void; stop(): Promise<v
role: 'assistant', role: 'assistant',
} as WsFrame); } as WsFrame);
// #10: external-agent turn begins.
emitAgentStatus(sessionId, chatId, agent, 'working', 'turn_start');
const manifestCommands = getManifestCommands(agent); const manifestCommands = getManifestCommands(agent);
if (manifestCommands.length > 0) { if (manifestCommands.length > 0) {
setTaskCommands(taskId, manifestCommands); setTaskCommands(taskId, manifestCommands);
@@ -399,6 +440,52 @@ export function createDispatcher(deps: Deps): { start(): void; stop(): Promise<v
outputSummary = result.output.slice(0, 500); outputSummary = result.output.slice(0, 500);
await persistExternalAgentTurn(sql, assistantId, result.toolSnapshots, acpReasoning); await persistExternalAgentTurn(sql, assistantId, result.toolSnapshots, acpReasoning);
} else { } else {
// v#7 (stream-json): claude + qwen run with --output-format stream-json.
// Parse the NDJSON live in pty-dispatch and forward AgentEvents here so we
// publish the SAME live frames the warm-ACP / opencode paths emit (text,
// reasoning, tool) and persist structured parts. Accumulate for the final
// message content + persistence; fall back to the opaque stdout slice when
// nothing parsed (agent ran without the flag, or crashed before emitting).
const ptyTextChunks: string[] = [];
const ptyReasoningChunks: string[] = [];
const ptyToolSnaps = new Map<string, AcpToolSnapshot>();
const onPtyEvent = (e: AgentEvent): void => {
switch (e.type) {
case 'text':
ptyTextChunks.push(e.text);
broker.publishFrame(sessionId, {
type: 'delta',
message_id: assistantId,
chat_id: chatId,
content: e.text,
} as WsFrame);
break;
case 'reasoning':
ptyReasoningChunks.push(e.text);
broker.publishFrame(sessionId, {
type: 'reasoning_delta',
message_id: assistantId,
chat_id: chatId,
content: e.text,
} as WsFrame);
break;
case 'tool_call':
case 'tool_update':
ptyToolSnaps.set(e.toolCall.toolCallId, e.toolCall);
broker.publishFrame(sessionId, {
type: 'tool_call',
message_id: assistantId,
chat_id: chatId,
tool_call: snapshotToWireToolCall(e.toolCall),
} as WsFrame);
break;
case 'commands':
// stream-json carries no commands today; ignore if it ever does.
break;
}
};
const result = await dispatchViaPty({ const result = await dispatchViaPty({
agent, agent,
task: task.input, task: task.input,
@@ -409,7 +496,22 @@ export function createDispatcher(deps: Deps): { start(): void; stop(): Promise<v
thinkingOptionId: task.thinking_option_id ?? undefined, thinkingOptionId: task.thinking_option_id ?? undefined,
signal: ac.signal, signal: ac.signal,
log, log,
onEvent: onPtyEvent,
}); });
if (result.streamed) {
assistantContent = ptyTextChunks.join('').slice(0, 50_000);
// stream-json text can be empty for a tool-only turn — surface stderr or a
// placeholder so the message row isn't blank.
if (!assistantContent) {
assistantContent = (result.stderr || '(no text output)').slice(0, 50_000);
}
outputSummary = (ptyTextChunks.join('') || result.stderr).slice(0, 500);
acpReasoning = ptyReasoningChunks.join('').slice(0, 200_000);
await persistExternalAgentTurn(sql, assistantId, [...ptyToolSnaps.values()], acpReasoning);
} else {
// Fallback: agent produced no parseable NDJSON (ran without the flag, or
// crashed). Preserve today's opaque stdout-slice + single delta behavior.
assistantContent = (result.stdout || result.stderr || '(no output)').slice(0, 50_000); assistantContent = (result.stdout || result.stderr || '(no output)').slice(0, 50_000);
outputSummary = (result.stdout || result.stderr).slice(0, 500); outputSummary = (result.stdout || result.stderr).slice(0, 500);
@@ -422,6 +524,7 @@ export function createDispatcher(deps: Deps): { start(): void; stop(): Promise<v
} as WsFrame); } as WsFrame);
} }
} }
}
await sql` await sql`
UPDATE messages UPDATE messages
@@ -477,6 +580,8 @@ export function createDispatcher(deps: Deps): { start(): void; stop(): Promise<v
WHERE id = ${taskId} WHERE id = ${taskId}
`; `;
log.info({ taskId, agent, costTokens: extCostTokens }, 'dispatcher: task completed (external)'); log.info({ taskId, agent, costTokens: extCostTokens }, 'dispatcher: task completed (external)');
// #10: external-agent turn completed cleanly.
emitAgentStatus(sessionId, chatId, agent, 'idle', 'turn_complete');
clearTaskCommands(taskId); clearTaskCommands(taskId);
} catch (err) { } catch (err) {
@@ -489,6 +594,11 @@ export function createDispatcher(deps: Deps): { start(): void; stop(): Promise<v
WHERE id = ${taskId} WHERE id = ${taskId}
`.catch(() => {}); `.catch(() => {});
// #10: external-agent turn failed/crashed. chatId may be unbound if the throw
// preceded its assignment — guard so the status publish never masks the real
// error.
if (chatId) emitAgentStatus(sessionId, chatId, agent, 'error', 'failed');
// Best-effort cleanup // Best-effort cleanup
await cleanupWorktree(projectPath, taskId); await cleanupWorktree(projectPath, taskId);
clearTaskCommands(taskId); clearTaskCommands(taskId);
@@ -543,6 +653,10 @@ export function createDispatcher(deps: Deps): { start(): void; stop(): Promise<v
const ac = new AbortController(); const ac = new AbortController();
// #10: hoisted so the catch can report `error` with the (chat, agent) key.
let sessionId = '';
let chatId = '';
try { try {
// execution_path = 'acp' — the schema CHECK has no 'opencode_server' value // execution_path = 'acp' — the schema CHECK has no 'opencode_server' value
// (schema is frozen at Phase 0); the warm-vs-one-shot distinction lives in // (schema is frozen at Phase 0); the warm-vs-one-shot distinction lives in
@@ -559,8 +673,6 @@ export function createDispatcher(deps: Deps): { start(): void; stop(): Promise<v
// it directly. Session-less creators (arena, MCP, new_task, generic // it directly. Session-less creators (arena, MCP, new_task, generic
// /api/tasks) leave it null; fall back to resolving/creating a real chat so // /api/tasks) leave it null; fall back to resolving/creating a real chat so
// ensureSession never receives a degenerate (null, agent) key. // ensureSession never receives a degenerate (null, agent) key.
let sessionId: string;
let chatId: string;
if (task.chat_id && task.session_id) { if (task.chat_id && task.session_id) {
sessionId = task.session_id; sessionId = task.session_id;
chatId = task.chat_id; chatId = task.chat_id;
@@ -617,6 +729,15 @@ export function createDispatcher(deps: Deps): { start(): void; stop(): Promise<v
`; `;
const assistantId = assistantMsg!.id; const assistantId = assistantMsg!.id;
// write-edit-robustness #4: pre-turn checkpoint of the persistent session
// worktree (best-effort; never breaks dispatch). worktreeId comes from the
// worktrees table (ensureSessionWorktree above).
await createCheckpoint(
sql,
{ chatId, sessionId, worktreeId, worktreePath, messageId: assistantId },
{ signal: ac.signal, log },
).catch(() => null);
broker.publishFrame(sessionId, { broker.publishFrame(sessionId, {
type: 'message_started', type: 'message_started',
message_id: assistantId, message_id: assistantId,
@@ -624,6 +745,9 @@ export function createDispatcher(deps: Deps): { start(): void; stop(): Promise<v
role: 'assistant', role: 'assistant',
} as WsFrame); } as WsFrame);
// #10: opencode-server turn begins.
emitAgentStatus(sessionId, chatId, agent, 'working', 'turn_start');
const manifestCommands = getManifestCommands(agent); const manifestCommands = getManifestCommands(agent);
if (manifestCommands.length > 0) { if (manifestCommands.length > 0) {
setTaskCommands(taskId, manifestCommands); setTaskCommands(taskId, manifestCommands);
@@ -783,6 +907,14 @@ export function createDispatcher(deps: Deps): { start(): void; stop(): Promise<v
WHERE id = ${taskId} WHERE id = ${taskId}
`; `;
log.info({ taskId, agent, finalState, costTokens: extCostTokens }, 'dispatcher: task finished (opencode server)'); log.info({ taskId, agent, finalState, costTokens: extCostTokens }, 'dispatcher: task finished (opencode server)');
// #10: clean completion → idle; backend-reported failure → error.
emitAgentStatus(
sessionId,
chatId,
agent,
result.ok ? 'idle' : 'error',
result.ok ? 'turn_complete' : 'failed',
);
clearTaskCommands(taskId); clearTaskCommands(taskId);
} catch (err) { } catch (err) {
const errMsg = err instanceof Error ? err.message : String(err); const errMsg = err instanceof Error ? err.message : String(err);
@@ -792,6 +924,8 @@ export function createDispatcher(deps: Deps): { start(): void; stop(): Promise<v
SET state = 'failed', ended_at = clock_timestamp(), output_summary = ${errMsg.slice(0, 500)} SET state = 'failed', ended_at = clock_timestamp(), output_summary = ${errMsg.slice(0, 500)}
WHERE id = ${taskId} WHERE id = ${taskId}
`.catch(() => {}); `.catch(() => {});
// #10: turn crashed.
if (chatId) emitAgentStatus(sessionId, chatId, agent, 'error', 'crashed');
clearTaskCommands(taskId); clearTaskCommands(taskId);
// No worktree cleanup (persistent); backend stays warm for the next turn. // No worktree cleanup (persistent); backend stays warm for the next turn.
} }
@@ -876,6 +1010,15 @@ export function createDispatcher(deps: Deps): { start(): void; stop(): Promise<v
`; `;
const assistantId = assistantMsg!.id; const assistantId = assistantMsg!.id;
// write-edit-robustness #4: pre-turn checkpoint of the persistent session
// worktree (best-effort; never breaks dispatch). Same worktree the opencode
// path uses — a chat that switches opencode↔goose↔qwen shares one worktree.
await createCheckpoint(
sql,
{ chatId, sessionId, worktreeId, worktreePath, messageId: assistantId },
{ signal: ac.signal, log },
).catch(() => null);
broker.publishFrame(sessionId, { broker.publishFrame(sessionId, {
type: 'message_started', type: 'message_started',
message_id: assistantId, message_id: assistantId,
@@ -883,6 +1026,9 @@ export function createDispatcher(deps: Deps): { start(): void; stop(): Promise<v
role: 'assistant', role: 'assistant',
} as WsFrame); } as WsFrame);
// #10: warm-ACP turn begins.
emitAgentStatus(sessionId, chatId, agent, 'working', 'turn_start');
const manifestCommands = getManifestCommands(agent); const manifestCommands = getManifestCommands(agent);
if (manifestCommands.length > 0) { if (manifestCommands.length > 0) {
setTaskCommands(taskId, manifestCommands); setTaskCommands(taskId, manifestCommands);
@@ -1024,6 +1170,14 @@ export function createDispatcher(deps: Deps): { start(): void; stop(): Promise<v
WHERE id = ${taskId} WHERE id = ${taskId}
`; `;
log.info({ taskId, agent, finalState }, 'dispatcher: task finished (warm ACP)'); log.info({ taskId, agent, finalState }, 'dispatcher: task finished (warm ACP)');
// #10: clean completion → idle; backend-reported failure → error.
emitAgentStatus(
sessionId,
chatId,
agent,
result.ok ? 'idle' : 'error',
result.ok ? 'turn_complete' : 'failed',
);
clearTaskCommands(taskId); clearTaskCommands(taskId);
} catch (err) { } catch (err) {
const errMsg = err instanceof Error ? err.message : String(err); const errMsg = err instanceof Error ? err.message : String(err);
@@ -1033,6 +1187,262 @@ export function createDispatcher(deps: Deps): { start(): void; stop(): Promise<v
SET state = 'failed', ended_at = clock_timestamp(), output_summary = ${errMsg.slice(0, 500)} SET state = 'failed', ended_at = clock_timestamp(), output_summary = ${errMsg.slice(0, 500)}
WHERE id = ${taskId} WHERE id = ${taskId}
`.catch(() => {}); `.catch(() => {});
// #10: turn crashed.
emitAgentStatus(sessionId, chatId, agent, 'error', 'crashed');
clearTaskCommands(taskId);
// No worktree cleanup (persistent); backend stays warm for the next turn.
}
}
// ─── Path B (claude SDK): warm Claude-SDK backend (v2.6 #9 Part 2) ───────────
// Claude-SDK backends are per (chat, agent) — each owns ONE persistent query()
// generator driven in streaming-input mode. Pool key = chatId (secondary = agent),
// mirroring agent_sessions' (chat_id, agent) PK + the warm-ACP pooling.
function getClaudeSdkBackend(chatId: string, agent: string, installPath: string | null): ClaudeSdkBackend {
let backend = agentPool.get(chatId, agent);
if (!backend) {
backend = new ClaudeSdkBackend({ sql, log, chatId, agent, installPath });
agentPool.register(chatId, agent, backend);
}
return backend as ClaudeSdkBackend;
}
async function runClaudeSdkTask(
task: {
id: string;
project_id: string;
input: string;
agent: string | null;
model: string | null;
mode_id: string | null;
thinking_option_id: string | null;
session_id: string | null;
chat_id: string | null;
},
installPath: string | null,
): Promise<void> {
const taskId = task.id;
const agent = task.agent!;
// shouldUseClaudeSdk guarantees both non-null before we get here.
const sessionId = task.session_id!;
const chatId = task.chat_id!;
log.info({ taskId, agent, chatId }, 'dispatcher: starting task (path B — claude SDK)');
const [project] = await sql<{ path: string | null }[]>`
SELECT path FROM projects WHERE id = ${task.project_id}
`;
const projectPath = project?.path;
if (!projectPath) {
await sql`
UPDATE tasks
SET state = 'failed', ended_at = clock_timestamp(), output_summary = 'Project has no path — cannot create worktree'
WHERE id = ${taskId}
`;
return;
}
const ac = new AbortController();
try {
await sql`
UPDATE tasks
SET state = 'running', started_at = clock_timestamp(), execution_path = 'acp'
WHERE id = ${taskId}
`;
// Persistent, session-keyed worktree (shared across turns + agents; NOT torn
// down per turn — Phase 3 reaps it). Same as the opencode/warm-ACP paths so a
// chat that switches agents shares one worktree.
const { worktreeId, worktreePath, baseCommit } = await ensureSessionWorktree(sql, projectPath, sessionId, {
signal: ac.signal,
});
log.info({ taskId, worktreePath }, 'dispatcher: session worktree ready (claude SDK)');
const [assistantMsg] = await sql<{ id: string }[]>`
INSERT INTO messages (session_id, chat_id, role, content, status, created_at)
VALUES (${sessionId}, ${chatId}, 'assistant', '', 'streaming', clock_timestamp())
RETURNING id
`;
const assistantId = assistantMsg!.id;
// write-edit-robustness #4: pre-turn checkpoint of the persistent session
// worktree (best-effort; never breaks dispatch).
await createCheckpoint(
sql,
{ chatId, sessionId, worktreeId, worktreePath, messageId: assistantId },
{ signal: ac.signal, log },
).catch(() => null);
broker.publishFrame(sessionId, {
type: 'message_started',
message_id: assistantId,
chat_id: chatId,
role: 'assistant',
} as WsFrame);
// #10: claude-SDK turn begins.
emitAgentStatus(sessionId, chatId, agent, 'working', 'turn_start');
const manifestCommands = getManifestCommands(agent);
if (manifestCommands.length > 0) {
setTaskCommands(taskId, manifestCommands);
broker.publishFrame(sessionId, {
type: 'agent_commands',
task_id: taskId,
session_id: sessionId,
commands: manifestCommands,
} as WsFrame);
}
// Accumulate the turn's stream for persistence + the final message content.
const textChunks: string[] = [];
const reasoningChunks: string[] = [];
const toolSnaps = new Map<string, AcpToolSnapshot>();
// Map transport-agnostic AgentEvents → the SAME WS frames the warm-ACP /
// opencode paths emit. This boundary attaches message_id/chat_id.
const onEvent = (e: AgentEvent): void => {
switch (e.type) {
case 'text':
textChunks.push(e.text);
broker.publishFrame(sessionId, {
type: 'delta',
message_id: assistantId,
chat_id: chatId,
content: e.text,
} as WsFrame);
break;
case 'reasoning':
reasoningChunks.push(e.text);
broker.publishFrame(sessionId, {
type: 'reasoning_delta',
message_id: assistantId,
chat_id: chatId,
content: e.text,
} as WsFrame);
break;
case 'tool_call':
case 'tool_update':
toolSnaps.set(e.toolCall.toolCallId, e.toolCall);
broker.publishFrame(sessionId, {
type: 'tool_call',
message_id: assistantId,
chat_id: chatId,
tool_call: snapshotToWireToolCall(e.toolCall),
} as WsFrame);
break;
case 'commands':
if (e.commands.length > 0) {
setTaskCommands(taskId, e.commands);
broker.publishFrame(sessionId, {
type: 'agent_commands',
task_id: taskId,
session_id: sessionId,
commands: e.commands,
} as WsFrame);
}
break;
}
};
const model = task.model ?? undefined;
const backend = getClaudeSdkBackend(chatId, agent, installPath);
const handle = await backend.ensureSession(sessionId, {
agent,
model: model ?? '',
chatId,
worktreePath,
worktreeId,
projectId: task.project_id,
});
const result = await backend.prompt(handle, task.input, {
worktreePath,
model: model ?? '',
signal: ac.signal,
onEvent,
taskId,
modeId: task.mode_id ?? undefined,
});
// Phase 3: keep the pooled (chat,agent) backend warm across the turn.
agentPool.touch(chatId, agent);
const assistantContent = textChunks.join('').slice(0, 50_000);
const reasoningText = reasoningChunks.join('').slice(0, 200_000);
const outputSummary = (result.ok ? textChunks.join('') : result.error ?? 'claude SDK turn failed').slice(0, 500);
await persistExternalAgentTurn(sql, assistantId, [...toolSnaps.values()], reasoningText);
await sql`
UPDATE messages
SET content = ${assistantContent}, status = 'complete', finished_at = clock_timestamp()
WHERE id = ${assistantId}
`;
broker.publishFrame(sessionId, {
type: 'message_complete',
message_id: assistantId,
chat_id: chatId,
} as WsFrame);
if (stopping) {
await sql`UPDATE tasks SET state = 'cancelled', ended_at = clock_timestamp() WHERE id = ${taskId}`;
return; // worktree persists (no cleanup); backend stays warm
}
// Diff the persistent worktree against its captured baseline and SUPERSEDE
// the session's prior pending row (latest-wins) — identical to opencode/ACP.
const diff = await diffWorktree(worktreePath, projectPath, {
signal: ac.signal,
baseRef: baseCommit ?? 'HEAD',
});
if (diff) {
await sql`
DELETE FROM pending_changes WHERE session_id = ${sessionId} AND status = 'pending'
`;
await sql`
INSERT INTO pending_changes (session_id, task_id, file_path, operation, diff, agent)
VALUES (${sessionId}, ${taskId}, ${projectPath}, 'edit', ${diff}, ${agent})
`;
log.info({ taskId, diffLength: diff.length }, 'dispatcher: diff superseded prior pending change (claude SDK)');
} else {
log.info({ taskId }, 'dispatcher: no changes detected in session worktree (claude SDK)');
}
// NO worktree cleanup — persistent (Phase 3 reaps it). Backend stays warm.
const [extCostRow] = await sql<{ total: number | null }[]>`
SELECT SUM(tokens_used)::int AS total
FROM messages
WHERE session_id = ${sessionId} AND tokens_used IS NOT NULL
`;
const extCostTokens = extCostRow?.total ?? null;
const finalState = result.ok ? 'completed' : 'failed';
await sql`
UPDATE tasks
SET state = ${finalState}, ended_at = clock_timestamp(), output_summary = ${outputSummary}, cost_tokens = ${extCostTokens}
WHERE id = ${taskId}
`;
log.info({ taskId, agent, finalState }, 'dispatcher: task finished (claude SDK)');
// #10: clean completion → idle; backend-reported failure → error.
emitAgentStatus(
sessionId,
chatId,
agent,
result.ok ? 'idle' : 'error',
result.ok ? 'turn_complete' : 'failed',
);
clearTaskCommands(taskId);
} catch (err) {
const errMsg = err instanceof Error ? err.message : String(err);
log.error({ taskId, agent, err: errMsg }, 'dispatcher: claude SDK error');
await sql`
UPDATE tasks
SET state = 'failed', ended_at = clock_timestamp(), output_summary = ${errMsg.slice(0, 500)}
WHERE id = ${taskId}
`.catch(() => {});
// #10: turn crashed.
emitAgentStatus(sessionId, chatId, agent, 'error', 'crashed');
clearTaskCommands(taskId); clearTaskCommands(taskId);
// No worktree cleanup (persistent); backend stays warm for the next turn. // No worktree cleanup (persistent); backend stays warm for the next turn.
} }

View File

@@ -0,0 +1,271 @@
// Fuzzy patch locator for staged edits.
//
// Local quantized models (qwen3.6 and friends) frequently reproduce an
// `old_string` with small, semantically-irrelevant drift: trailing whitespace,
// a different indent width, or "smart" unicode punctuation (curly quotes, an
// en/em-dash, a non-breaking space) where the source has the plain ASCII form.
// An exact `String.includes` then fails and the queued edit is lost even though
// a human would say it obviously matches.
//
// `locateMatch` walks a ladder of progressively looser strategies and returns
// the real `[start, end)` byte-offset span in the ORIGINAL content so the caller
// can splice in `new_string` over the true file text (preserving the file's own
// whitespace/unicode, not the model's drifted copy). The ladder stops at the
// first strategy that resolves to a single span:
//
// 1. exact — indexOf; >1 hit is reported `ambiguous` (we refuse to
// guess which occurrence the model meant).
// 2. per-line ws — line-window compare ignoring per-line trailing
// whitespace and leading/trailing blank needle lines.
// 3. unicode canon — same line-window compare after folding smart
// punctuation to ASCII on both sides; the match is
// mapped back to original offsets.
// 4. levenshtein — best line-window by normalized edit-distance
// similarity; accepted only at >= SIMILARITY_THRESHOLD.
//
// Pure and dependency-free (Levenshtein is the standard iterative two-row DP),
// reimplemented from the general technique — no vendored source.
export type MatchResult =
| { kind: 'exact' | 'fuzzy'; start: number; end: number } // [start,end) offsets into content
| { kind: 'ambiguous'; count: number }
| { kind: 'not_found' };
/** Levenshtein similarity floor for the final fuzzy fallback (strategy 4). */
export const SIMILARITY_THRESHOLD = 0.66;
export function locateMatch(content: string, needle: string): MatchResult {
// Empty needle has no meaningful match.
if (needle.length === 0) return { kind: 'not_found' };
// --- 1. Exact ----------------------------------------------------------------
const exact = locateExact(content, needle);
if (exact) return exact;
// --- 2. Per-line whitespace-insensitive -------------------------------------
const ws = locateByLineWindow(content, needle);
if (ws) return ws;
// --- 3. Unicode-canonicalized whitespace pass -------------------------------
const canon = locateCanonical(content, needle);
if (canon) return canon;
// --- 4. Levenshtein similarity ----------------------------------------------
const lev = locateByLevenshtein(content, needle);
if (lev) return lev;
return { kind: 'not_found' };
}
// --- Strategy 1: exact -------------------------------------------------------
function locateExact(content: string, needle: string): MatchResult | null {
const first = content.indexOf(needle);
if (first === -1) return null;
const second = content.indexOf(needle, first + 1);
if (second === -1) {
return { kind: 'exact', start: first, end: first + needle.length };
}
// Count all occurrences so the caller can report a useful number.
let count = 2;
let idx = content.indexOf(needle, second + 1);
while (idx !== -1) {
count++;
idx = content.indexOf(needle, idx + 1);
}
return { kind: 'ambiguous', count };
}
// --- Line-window machinery ---------------------------------------------------
interface Line {
/** Raw line text (no trailing newline). */
text: string;
/** Offset of the first char of this line in the original content. */
start: number;
/** Offset one past the last char of this line (before its newline, if any). */
end: number;
}
/**
* Split content into lines, tracking each line's real offset span. The span
* EXCLUDES the trailing newline so consecutive line spans plus their newlines
* exactly reconstruct the content; the match span we hand back covers from the
* first matched line's start through the last matched line's end (i.e. without a
* trailing newline), which is what an in-place splice wants.
*/
function splitLines(content: string): Line[] {
const lines: Line[] = [];
let start = 0;
for (let i = 0; i <= content.length; i++) {
if (i === content.length || content[i] === '\n') {
lines.push({ text: content.slice(start, i), start, end: i });
start = i + 1;
}
}
return lines;
}
/** Strip leading/trailing all-blank lines; returns the trimmed slice. */
function trimBlankLines(lines: string[]): string[] {
let lo = 0;
let hi = lines.length;
while (lo < hi && lines[lo]!.trim() === '') lo++;
while (hi > lo && lines[hi - 1]!.trim() === '') hi--;
return lines.slice(lo, hi);
}
/**
* Find a contiguous window of content lines whose trailing-whitespace-trimmed
* text equals the needle's (blank-trimmed) lines. Returns the real offset span
* over the matched content lines, or null if zero match. Multiple matches →
* ambiguous. `normalize` lets the caller fold unicode before comparing.
*/
function locateByLineWindow(
content: string,
needle: string,
normalize: (s: string) => string = (s) => s,
): MatchResult | null {
const contentLines = splitLines(content);
const needleLines = trimBlankLines(needle.split('\n'));
const n = needleLines.length;
if (n === 0) return null;
// A single needle line that is itself blank can't be located meaningfully.
if (n === 1 && needleLines[0]!.trim() === '') return null;
const needleKey = needleLines.map((l) => normalize(l.trimEnd())).join('\n');
const hits: Array<{ start: number; end: number }> = [];
for (let i = 0; i + n <= contentLines.length; i++) {
const windowKey = contentLines
.slice(i, i + n)
.map((l) => normalize(l.text.trimEnd()))
.join('\n');
if (windowKey === needleKey) {
hits.push({ start: contentLines[i]!.start, end: contentLines[i + n - 1]!.end });
}
}
if (hits.length === 0) return null;
if (hits.length > 1) return { kind: 'ambiguous', count: hits.length };
return { kind: 'fuzzy', start: hits[0]!.start, end: hits[0]!.end };
}
// --- Strategy 3: unicode canonicalization ------------------------------------
/**
* Fold smart punctuation to its ASCII equivalent. Crucially this is a
* length-PRESERVING, per-character map (every replacement is one char → one
* char), so an offset into the canonical string is also a valid offset into the
* original — letting strategy 3 reuse the line-window matcher and still hand
* back true original-content offsets.
*/
function canonicalizeChar(ch: string): string {
switch (ch) {
// single quotes / apostrophes
case '': // '
case '': // '
case '': //
case '': //
return "'";
// double quotes
case '“': // "
case '”': // "
case '„': // „
case '‟': // ‟
return '"';
// dashes
case '': // en dash
case '—': // — em dash
case '': // figure dash
case '―': // ― horizontal bar
case '': // minus sign
return '-';
// spaces
case ' ': // nbsp
case '': // figure space
case '': // narrow nbsp
return ' ';
default:
return ch;
}
}
function canonicalize(s: string): string {
let out = '';
for (const ch of s) out += canonicalizeChar(ch);
return out;
}
function locateCanonical(content: string, needle: string): MatchResult | null {
// Only worth running if canonicalization actually changes something on either
// side — otherwise it's identical to strategy 2 which already failed.
const canonContent = canonicalize(content);
const canonNeedle = canonicalize(needle);
if (canonContent === content && canonNeedle === needle) return null;
// Offsets are preserved (length-preserving fold), so a match on the canonical
// content maps directly back to the original.
return locateByLineWindow(canonContent, canonNeedle);
}
// --- Strategy 4: Levenshtein similarity --------------------------------------
/** Standard iterative two-row Levenshtein edit distance. */
function levenshtein(a: string, b: string): number {
if (a === b) return 0;
if (a.length === 0) return b.length;
if (b.length === 0) return a.length;
let prev = new Array<number>(b.length + 1);
let curr = new Array<number>(b.length + 1);
for (let j = 0; j <= b.length; j++) prev[j] = j;
for (let i = 1; i <= a.length; i++) {
curr[0] = i;
const ac = a.charCodeAt(i - 1);
for (let j = 1; j <= b.length; j++) {
const cost = ac === b.charCodeAt(j - 1) ? 0 : 1;
curr[j] = Math.min(
prev[j]! + 1, // deletion
curr[j - 1]! + 1, // insertion
prev[j - 1]! + cost, // substitution
);
}
[prev, curr] = [curr, prev];
}
return prev[b.length]!;
}
/** Normalized similarity in [0,1]: 1 - dist / max(len). */
function similarity(a: string, b: string): number {
const maxLen = Math.max(a.length, b.length);
if (maxLen === 0) return 1;
return 1 - levenshtein(a, b) / maxLen;
}
function locateByLevenshtein(content: string, needle: string): MatchResult | null {
const contentLines = splitLines(content);
const needleLines = trimBlankLines(needle.split('\n'));
const n = needleLines.length;
if (n === 0) return null;
if (contentLines.length < n) return null;
const needleJoined = needleLines.map((l) => l.trim()).join('\n');
let best = -1;
let bestSpan: { start: number; end: number } | null = null;
for (let i = 0; i + n <= contentLines.length; i++) {
const window = contentLines.slice(i, i + n);
const windowJoined = window.map((l) => l.text.trim()).join('\n');
const score = similarity(windowJoined, needleJoined);
if (score > best) {
best = score;
bestSpan = { start: window[0]!.start, end: window[n - 1]!.end };
}
}
if (bestSpan && best >= SIMILARITY_THRESHOLD) {
return { kind: 'fuzzy', start: bestSpan.start, end: bestSpan.end };
}
return null;
}

View File

@@ -0,0 +1,92 @@
/**
* normalize-agent-status (#10) — clean-room vendor-event → bucket mapping.
*
* Different coding agents (claude, opencode, codex/gemini, goose, qwen) emit
* lifecycle hook events under inconsistent names: PascalCase (`SessionStart`),
* snake_case (`session_start`), camelCase (`sessionStart`), and a handful of
* provider-specific approval events (`exec_approval_request`). This module
* collapses every known event name into one of three coarse signals:
*
* working — the agent is actively progressing a turn
* blocked — the agent is waiting on a human (permission / approval / question)
* done — the turn / session ended cleanly
*
* `null` is returned for anything unrecognized so callers can ignore noise.
*
* Built now for the scoped status-publish, but specifically shaped for reuse by
* the documented config-injection follow-on: a future notify-hook injected into
* each agent's native config will POST the RAW vendor event name to a BooCoder
* endpoint, which runs this helper to derive the normalized status. The names
* below are facts about each agent's hook surface — not copied vendor code.
*/
export type AgentStatus = 'working' | 'blocked' | 'idle' | 'error';
/** The coarse signal a raw vendor event collapses to. */
export type AgentEventBucket = 'working' | 'blocked' | 'done';
// Each bucket lists the canonical vendor event names. Lookup is
// case-insensitive AND separator-insensitive (snake_case / camelCase /
// PascalCase all fold to the same key), so we normalize the raw input the same
// way before matching rather than enumerating every spelling here.
const WORKING_EVENTS = [
'SessionStart',
'UserPromptSubmit',
'UserPromptSubmitted',
'PostToolUse',
'PostToolUseFailure',
'BeforeAgent',
'AfterTool',
'task_started',
] as const;
const BLOCKED_EVENTS = [
'PreToolUse',
'Notification',
'PermissionRequest',
'exec_approval_request',
'apply_patch_approval_request',
'request_user_input',
] as const;
const DONE_EVENTS = [
'Stop',
'AfterAgent',
'SessionEnd',
'task_complete',
'agent-turn-complete',
] as const;
/**
* Fold a raw event name to a separator/case-insensitive key:
* strip every non-alphanumeric character and lowercase. So `post_tool_use`,
* `postToolUse`, `PostToolUse`, and `POST-TOOL-USE` all map to `posttooluse`.
*/
function foldKey(raw: string): string {
return raw.replace(/[^a-z0-9]/gi, '').toLowerCase();
}
function buildLookup(
groups: ReadonlyArray<readonly [AgentEventBucket, readonly string[]]>,
): Map<string, AgentEventBucket> {
const map = new Map<string, AgentEventBucket>();
for (const [bucket, names] of groups) {
for (const name of names) map.set(foldKey(name), bucket);
}
return map;
}
const EVENT_LOOKUP = buildLookup([
['working', WORKING_EVENTS],
['blocked', BLOCKED_EVENTS],
['done', DONE_EVENTS],
]);
/**
* Map a raw vendor hook-event name to its normalized bucket, or `null` when the
* name is unknown / undefined. Case- and separator-insensitive.
*/
export function normalizeAgentEvent(raw: string | undefined): AgentEventBucket | null {
if (!raw) return null;
return EVENT_LOOKUP.get(foldKey(raw)) ?? null;
}

View File

@@ -2,6 +2,7 @@ import { readFile, writeFile, unlink, mkdir } from 'node:fs/promises';
import { dirname } from 'node:path'; import { dirname } from 'node:path';
import type { Sql } from '../db.js'; import type { Sql } from '../db.js';
import { resolveWritePath } from './write_guard.js'; import { resolveWritePath } from './write_guard.js';
import { locateMatch } from './fuzzy-match.js';
// --- Types ------------------------------------------------------------------- // --- Types -------------------------------------------------------------------
@@ -121,10 +122,18 @@ export async function applyOne(
case 'edit': { case 'edit': {
const { old: oldStr, new: newStr } = JSON.parse(change.diff) as { old: string; new: string }; const { old: oldStr, new: newStr } = JSON.parse(change.diff) as { old: string; new: string };
const content = await readFile(change.file_path, 'utf8'); const content = await readFile(change.file_path, 'utf8');
if (!content.includes(oldStr)) { const match = locateMatch(content, oldStr);
throw new Error('old_string not found in file — file may have changed since the edit was queued'); if (match.kind === 'ambiguous') {
throw new Error(
`old_string matches ${match.count} locations — add surrounding context to disambiguate`,
);
} }
const updated = content.replace(oldStr, newStr); if (match.kind === 'not_found') {
throw new Error(
'old_string not found in file (even fuzzily) — file may have changed since the edit was queued',
);
}
const updated = content.slice(0, match.start) + newStr + content.slice(match.end);
await writeFile(change.file_path, updated, 'utf8'); await writeFile(change.file_path, updated, 'utf8');
break; break;
} }
@@ -203,10 +212,18 @@ export async function rewindOne(
// Reverse an edit: swap old and new // Reverse an edit: swap old and new
const { old: oldStr, new: newStr } = JSON.parse(change.diff) as { old: string; new: string }; const { old: oldStr, new: newStr } = JSON.parse(change.diff) as { old: string; new: string };
const content = await readFile(change.file_path, 'utf8'); const content = await readFile(change.file_path, 'utf8');
if (!content.includes(newStr)) { const match = locateMatch(content, newStr);
throw new Error('new_string not found in file — cannot rewind; file may have been modified since apply'); if (match.kind === 'ambiguous') {
throw new Error(
`new_string matches ${match.count} locations — cannot rewind; add surrounding context to disambiguate`,
);
} }
const reverted = content.replace(newStr, oldStr); if (match.kind === 'not_found') {
throw new Error(
'new_string not found in file (even fuzzily) — cannot rewind; file may have been modified since apply',
);
}
const reverted = content.slice(0, match.start) + oldStr + content.slice(match.end);
await writeFile(change.file_path, reverted, 'utf8'); await writeFile(change.file_path, reverted, 'utf8');
break; break;
} }

View File

@@ -38,6 +38,12 @@ export const PROVIDERS: ProviderDef[] = [
}, },
{ {
name: 'claude', name: 'claude',
// transport stays 'pty' — the DEFAULT dispatch path (one-shot `claude
// --output-format stream-json`). claude-sdk-sessionstore #9 (Part 2) adds a warm
// Claude-Agent-SDK backend (services/backends/claude-sdk.ts) routed ONLY when the
// `CLAUDE_SDK_BACKEND` env flag is truthy AND the task is a chat tab; with the flag
// off (production default) claude always uses this PTY path, so the transport label
// is left unchanged. Flip the env var on a host (after a live smoke) to opt in.
label: 'Claude Code', label: 'Claude Code',
transport: 'pty', transport: 'pty',
modelSource: 'static', modelSource: 'static',

View File

@@ -1,13 +1,29 @@
/** /**
* PTY dispatch — runs external agents directly on the host. * PTY dispatch — runs external agents directly on the host.
*
* claude + qwen run with `--output-format stream-json` and emit Claude-Code's
* stream-json NDJSON on stdout. When an `onEvent` callback is supplied we
* line-buffer that stdout (split on `\n`, hold the partial tail) and feed complete
* lines to `makeStreamJsonParser` so deltas surface live as AgentEvents. The raw
* stdout is still accumulated + returned for back-compat (and the dispatcher's
* fallback when nothing parsed). See `stream-json-parser.ts`.
*/ */
import type { FastifyBaseLogger } from 'fastify'; import type { FastifyBaseLogger } from 'fastify';
import { spawn } from 'node:child_process'; import { spawn } from 'node:child_process';
import type { AgentEvent } from './agent-backend.js';
import { makeStreamJsonParser, type StreamJsonUsage } from './stream-json-parser.js';
export interface DispatchResult { export interface DispatchResult {
exitCode: number; exitCode: number;
stdout: string; stdout: string;
stderr: string; stderr: string;
/** True iff at least one NDJSON AgentEvent was parsed from stdout (v#7). When
* false the dispatcher falls back to slicing stdout as the assistant content. */
streamed: boolean;
/** Final usage parsed from the stream-json `result` / `message_delta`, if any. */
usage?: StreamJsonUsage;
/** Provider session id from the stream-json `system` init line, if any. */
agentSessionId?: string | null;
} }
export interface PtyDispatchOpts { export interface PtyDispatchOpts {
@@ -20,6 +36,10 @@ export interface PtyDispatchOpts {
installPath?: string; installPath?: string;
signal?: AbortSignal; signal?: AbortSignal;
log: FastifyBaseLogger; log: FastifyBaseLogger;
/** Optional live event sink. When set, stdout is line-buffered + NDJSON-parsed
* and each AgentEvent is forwarded here as it arrives. Absent → opaque (old)
* behavior: stdout is accumulated and returned, no parsing. */
onEvent?: (e: AgentEvent) => void;
} }
interface PtySpawnSpec { interface PtySpawnSpec {
@@ -40,7 +60,9 @@ function buildPtySpawnSpec(
switch (agent) { switch (agent) {
case 'claude': { case 'claude': {
const args = ['-p']; // stream-json on -p requires --verbose (Claude Code rejects stream-json
// print mode without it). qwen needs no such flag.
const args = ['-p', '--output-format', 'stream-json', '--verbose'];
if (model) args.push('--model', model); if (model) args.push('--model', model);
if (modeId) args.push('--permission-mode', modeId); if (modeId) args.push('--permission-mode', modeId);
if (thinkingOptionId) args.push('--effort', thinkingOptionId); if (thinkingOptionId) args.push('--effort', thinkingOptionId);
@@ -73,7 +95,7 @@ function buildPtySpawnSpec(
} }
export async function dispatchViaPty(opts: PtyDispatchOpts): Promise<DispatchResult> { export async function dispatchViaPty(opts: PtyDispatchOpts): Promise<DispatchResult> {
const { agent, task, worktreePath, model, modeId, thinkingOptionId, installPath, signal, log } = opts; const { agent, task, worktreePath, model, modeId, thinkingOptionId, installPath, signal, log, onEvent } = opts;
const cmd = buildPtySpawnSpec(agent, task, model, modeId, thinkingOptionId, installPath); const cmd = buildPtySpawnSpec(agent, task, model, modeId, thinkingOptionId, installPath);
if (!cmd) { if (!cmd) {
@@ -81,6 +103,7 @@ export async function dispatchViaPty(opts: PtyDispatchOpts): Promise<DispatchRes
exitCode: 1, exitCode: 1,
stdout: '', stdout: '',
stderr: `Agent '${agent}' is not yet supported for PTY dispatch.`, stderr: `Agent '${agent}' is not yet supported for PTY dispatch.`,
streamed: false,
}; };
} }
@@ -102,7 +125,32 @@ export async function dispatchViaPty(opts: PtyDispatchOpts): Promise<DispatchRes
let stderr = ''; let stderr = '';
let killed = false; let killed = false;
child.stdout!.on('data', (chunk: Buffer) => { stdout += chunk.toString(); }); // Live NDJSON parsing (only when a sink is supplied). Line-buffer: split on
// '\n', dispatch complete lines, hold the partial tail until the next chunk.
const parser = onEvent ? makeStreamJsonParser() : null;
let lineBuf = '';
let streamed = false;
const feedLine = (line: string): void => {
if (!parser || !onEvent) return;
for (const e of parser.push(line)) {
streamed = true;
onEvent(e);
}
};
child.stdout!.on('data', (chunk: Buffer) => {
const text = chunk.toString();
stdout += text;
if (!parser) return;
lineBuf += text;
let nl = lineBuf.indexOf('\n');
while (nl !== -1) {
const line = lineBuf.slice(0, nl);
lineBuf = lineBuf.slice(nl + 1);
feedLine(line);
nl = lineBuf.indexOf('\n');
}
});
child.stderr!.on('data', (chunk: Buffer) => { stderr += chunk.toString(); }); child.stderr!.on('data', (chunk: Buffer) => { stderr += chunk.toString(); });
const cleanup = () => { const cleanup = () => {
@@ -116,7 +164,7 @@ export async function dispatchViaPty(opts: PtyDispatchOpts): Promise<DispatchRes
if (signal) { if (signal) {
if (signal.aborted) { if (signal.aborted) {
cleanup(); cleanup();
resolve({ exitCode: 130, stdout: '', stderr: 'Aborted before start' }); resolve({ exitCode: 130, stdout: '', stderr: 'Aborted before start', streamed: false });
return; return;
} }
signal.addEventListener('abort', cleanup, { once: true }); signal.addEventListener('abort', cleanup, { once: true });
@@ -124,8 +172,18 @@ export async function dispatchViaPty(opts: PtyDispatchOpts): Promise<DispatchRes
child.on('close', (code) => { child.on('close', (code) => {
if (signal) signal.removeEventListener('abort', cleanup); if (signal) signal.removeEventListener('abort', cleanup);
log.info({ agent, exitCode: code }, 'pty-dispatch: completed'); // Flush any final line with no trailing newline.
resolve({ exitCode: code ?? 1, stdout, stderr }); if (lineBuf.trim()) feedLine(lineBuf);
lineBuf = '';
log.info({ agent, exitCode: code, streamed }, 'pty-dispatch: completed');
resolve({
exitCode: code ?? 1,
stdout,
stderr,
streamed,
usage: parser?.usage(),
agentSessionId: parser?.sessionId() ?? null,
});
}); });
child.on('error', (err) => { child.on('error', (err) => {

View File

@@ -0,0 +1,296 @@
/**
* Claude-Code-compatible stream-json NDJSON parser (feature #7,
* openspec `sampling-streamjson-tokens`).
*
* qwen (`--output-format stream-json`) and claude (`--output-format stream-json`)
* both emit Claude-Code's stream-json NDJSON on stdout: one JSON object per line.
* This module turns that stream into the same transport-agnostic `AgentEvent`s the
* ACP / opencode-server backends emit, so the PTY dispatch path can publish live
* broker frames + persist structured parts instead of slicing stdout opaque.
*
* Two surfaces:
* - `parseStreamJsonLine(line, state)` — PURE per-line mapping (unit-testable).
* `state` is the caller-owned accumulator (open tool blocks + usage/session_id).
* - `makeStreamJsonParser()` — a thin stateful wrapper holding the state, with a
* `push(line)` that returns the events for that line and getters for the final
* `usage` / `sessionId`.
*
* Defensive by contract: a non-JSON / partial / garbage line yields `[]` and never
* throws. Tool args (`input_json_delta`) arrive fragmented across many lines; we
* accumulate the partial JSON string per content-block index and only surface the
* parsed `rawInput` once the block stops (or, as a fallback, off the terminal
* `assistant` message which carries the fully-assembled `tool_use` blocks).
*
* Schema (keyed on top-level `type`):
* - `system` — init: { session_id, tools, ... }
* - `assistant` — { message: { content: [ {type:'text'|'thinking'|'tool_use', ...} ], usage? } }
* - `user` — tool results (ignored — diffing the worktree captures effects)
* - `result` — final: { usage: { input_tokens, output_tokens }, session_id? }
* - `stream_event` — { event: { type, index?, content_block?, delta?, usage? } }
* event.type:
* content_block_start — { index, content_block: {type, id?, name?} }
* content_block_delta — { index, delta: {type, text?|thinking?|partial_json?} }
* content_block_stop — { index }
* message_delta — { usage: { output_tokens } }
* message_start — { message: { usage } }
*/
import type { AgentEvent } from './agent-backend.js';
import type { AcpToolSnapshot } from './acp-tool-snapshot.js';
/** Convenience alias for the per-line return value. */
export type AgentEventList = AgentEvent[];
export interface StreamJsonUsage {
inputTokens?: number;
outputTokens?: number;
}
/** Per-open-content-block accumulation for tool args assembled across deltas. */
interface OpenToolBlock {
toolCallId: string;
name: string;
/** Concatenated `input_json_delta.partial_json` fragments. */
partialJson: string;
}
export interface StreamJsonState {
/** content-block index → open tool block (only `tool_use` blocks are tracked). */
toolBlocks: Map<number, OpenToolBlock>;
sessionId: string | null;
usage: StreamJsonUsage;
}
export function makeStreamJsonState(): StreamJsonState {
return { toolBlocks: new Map(), sessionId: null, usage: {} };
}
function asRecord(value: unknown): Record<string, unknown> | null {
if (value && typeof value === 'object' && !Array.isArray(value)) {
return value as Record<string, unknown>;
}
return null;
}
function asString(value: unknown): string | undefined {
return typeof value === 'string' ? value : undefined;
}
function asNumber(value: unknown): number | undefined {
return typeof value === 'number' && Number.isFinite(value) ? value : undefined;
}
/** Pull token counts out of an Anthropic-shape `usage` object, mutating state. */
function captureUsage(usage: Record<string, unknown> | null, state: StreamJsonState): void {
if (!usage) return;
const input = asNumber(usage.input_tokens);
const output = asNumber(usage.output_tokens);
if (input !== undefined) state.usage.inputTokens = input;
// output_tokens is reported incrementally on message_delta; keep the latest.
if (output !== undefined) state.usage.outputTokens = output;
}
/** Parse the accumulated tool-arg JSON; tolerate an unparseable/partial body. */
function parseToolInput(partialJson: string): unknown {
const trimmed = partialJson.trim();
if (!trimmed) return {};
try {
return JSON.parse(trimmed);
} catch {
return { _raw: partialJson };
}
}
function toolSnapshot(block: OpenToolBlock, rawInput: unknown, status: AcpToolSnapshot['status']): AcpToolSnapshot {
return {
toolCallId: block.toolCallId,
title: block.name,
kind: null,
status,
rawInput,
};
}
/**
* Map one stream-event sub-object (the `event` field of a `stream_event` line) to
* AgentEvents, mutating `state` for open tool blocks + usage.
*/
function handleStreamEvent(event: Record<string, unknown>, state: StreamJsonState): AgentEvent[] {
const eventType = asString(event.type);
if (!eventType) return [];
switch (eventType) {
case 'content_block_start': {
const index = asNumber(event.index);
const block = asRecord(event.content_block);
if (index === undefined || !block) return [];
if (asString(block.type) !== 'tool_use') return [];
const toolCallId = asString(block.id) ?? `tool_${index}`;
const name = asString(block.name) ?? 'tool';
const open: OpenToolBlock = { toolCallId, name, partialJson: '' };
state.toolBlocks.set(index, open);
// Surface the tool start immediately (running, no args yet) so the UI shows
// the call before the args finish streaming.
return [{ type: 'tool_call', toolCall: toolSnapshot(open, {}, 'in_progress') }];
}
case 'content_block_delta': {
const index = asNumber(event.index);
const delta = asRecord(event.delta);
if (delta === null) return [];
const deltaType = asString(delta.type);
if (deltaType === 'text_delta') {
const text = asString(delta.text);
return text ? [{ type: 'text', text }] : [];
}
if (deltaType === 'thinking_delta') {
const text = asString(delta.thinking);
return text ? [{ type: 'reasoning', text }] : [];
}
if (deltaType === 'input_json_delta') {
// Accumulate tool args; no event until the block stops.
const fragment = asString(delta.partial_json);
if (index !== undefined && fragment) {
const open = state.toolBlocks.get(index);
if (open) open.partialJson += fragment;
}
return [];
}
return [];
}
case 'content_block_stop': {
const index = asNumber(event.index);
if (index === undefined) return [];
const open = state.toolBlocks.get(index);
if (!open) return [];
state.toolBlocks.delete(index);
const rawInput = parseToolInput(open.partialJson);
return [{ type: 'tool_update', toolCall: toolSnapshot(open, rawInput, 'completed') }];
}
case 'message_start': {
const message = asRecord(event.message);
captureUsage(asRecord(message?.usage), state);
return [];
}
case 'message_delta': {
captureUsage(asRecord(event.usage), state);
return [];
}
default:
return [];
}
}
/**
* Map the terminal `assistant` message (post-hoc full message) to AgentEvents. Used
* as a fallback for transports that emit only the assembled `assistant` line and no
* incremental `stream_event`s. When stream_events already streamed a block, the
* caller dedups by toolCallId, so re-emitting the assembled tool_use is harmless.
*/
function handleAssistantMessage(message: Record<string, unknown>, state: StreamJsonState): AgentEvent[] {
captureUsage(asRecord(message.usage), state);
const content = message.content;
if (!Array.isArray(content)) return [];
const out: AgentEvent[] = [];
let toolIdx = 0;
for (const rawBlock of content) {
const block = asRecord(rawBlock);
if (!block) continue;
const blockType = asString(block.type);
if (blockType === 'text') {
const text = asString(block.text);
if (text) out.push({ type: 'text', text });
} else if (blockType === 'thinking') {
const text = asString(block.thinking);
if (text) out.push({ type: 'reasoning', text });
} else if (blockType === 'tool_use') {
const toolCallId = asString(block.id) ?? `tool_${toolIdx}`;
const name = asString(block.name) ?? 'tool';
const rawInput = 'input' in block ? block.input : {};
out.push({
type: 'tool_update',
toolCall: { toolCallId, title: name, kind: null, status: 'completed', rawInput },
});
}
toolIdx++;
}
return out;
}
/**
* Pure per-line mapping. `line` is a single complete NDJSON line (no trailing
* newline required; surrounding whitespace tolerated). Returns the AgentEvents the
* line produces and mutates `state` (open tool blocks, usage, session_id). A blank,
* non-JSON, or unrecognized line yields `[]` and never throws.
*/
export function parseStreamJsonLine(line: string, state: StreamJsonState): AgentEvent[] {
const trimmed = line.trim();
if (!trimmed) return [];
let obj: Record<string, unknown> | null;
try {
const parsed: unknown = JSON.parse(trimmed);
obj = asRecord(parsed);
} catch {
return [];
}
if (!obj) return [];
const type = asString(obj.type);
switch (type) {
case 'system': {
const sid = asString(obj.session_id);
if (sid) state.sessionId = sid;
return [];
}
case 'stream_event': {
const event = asRecord(obj.event);
return event ? handleStreamEvent(event, state) : [];
}
case 'assistant': {
const sid = asString(obj.session_id);
if (sid) state.sessionId = sid;
const message = asRecord(obj.message);
return message ? handleAssistantMessage(message, state) : [];
}
case 'result': {
const sid = asString(obj.session_id);
if (sid) state.sessionId = sid;
captureUsage(asRecord(obj.usage), state);
return [];
}
default:
// `user` (tool results) and any unknown line type — ignore.
return [];
}
}
export interface StreamJsonParser {
/** Feed one complete NDJSON line; returns its AgentEvents (never throws). */
push(line: string): AgentEvent[];
/** Final usage (input/output tokens) accumulated so far. */
usage(): StreamJsonUsage;
/** Provider session id from the init `system` line / `result`, if seen. */
sessionId(): string | null;
}
/**
* Stateful wrapper around `parseStreamJsonLine`. Holds per-tool-block accumulation
* + usage/session_id across the turn. Line-buffering (splitting stdout on `\n` and
* holding the partial tail) is the caller's job — see `pty-dispatch.ts`.
*/
export function makeStreamJsonParser(): StreamJsonParser {
const state = makeStreamJsonState();
return {
push: (line: string) => parseStreamJsonLine(line, state),
usage: () => ({ ...state.usage }),
sessionId: () => state.sessionId,
};
}

View File

@@ -5,5 +5,11 @@ export default defineConfig({
environment: 'node', environment: 'node',
globals: false, globals: false,
include: ['src/**/__tests__/**/*.test.ts'], include: ['src/**/__tests__/**/*.test.ts'],
// DB-integration suites (checkpoints, claude-session-store, reconnect, etc.)
// each apply the full schema in beforeAll against the one shared dev DB; running
// test files in parallel makes those concurrent DDL applies deadlock under
// DATABASE_URL. Serialize file execution — the suites are fast, so the cost is
// negligible and the default (no-DATABASE_URL) run is unaffected.
fileParallelism: false,
}, },
}); });

View File

@@ -1,4 +1,4 @@
import { describe, it, expect } from 'vitest'; import { describe, it, expect, vi, afterEach } from 'vitest';
import { isAgentRegistryMarkdown, parseAgentsMd } from '../agents.js'; import { isAgentRegistryMarkdown, parseAgentsMd } from '../agents.js';
describe('isAgentRegistryMarkdown', () => { describe('isAgentRegistryMarkdown', () => {
@@ -31,3 +31,87 @@ Start here
expect(r.errors.length).toBeGreaterThan(0); expect(r.errors.length).toBeGreaterThan(0);
}); });
}); });
// v2.6 sampling-streamjson-tokens (#11): per-agent llama.cpp sampler extensions.
describe('parseAgentsMd: v2.6 sampling knobs', () => {
afterEach(() => {
vi.restoreAllMocks();
});
const withFrontmatter = (lines: string) => `# Agents
## Sampler
---
temperature: 0.6
${lines}
tools: [view_file]
description: test
---
You sample.
`;
it('parses top_n_sigma and the dry_* family from frontmatter', () => {
const md = withFrontmatter(
[
'top_n_sigma: 1.5',
'dry_multiplier: 0.8',
'dry_base: 1.75',
'dry_allowed_length: 2',
'dry_penalty_last_n: -1',
].join('\n'),
);
const { agents, errors } = parseAgentsMd(md);
expect(errors).toHaveLength(0);
expect(agents).toHaveLength(1);
const a = agents[0]!;
expect(a.top_n_sigma).toBe(1.5);
expect(a.dry_multiplier).toBe(0.8);
expect(a.dry_base).toBe(1.75);
expect(a.dry_allowed_length).toBe(2);
expect(a.dry_penalty_last_n).toBe(-1);
});
it('defaults the new sampler fields to null when omitted', () => {
const { agents } = parseAgentsMd(withFrontmatter('top_p: 0.95'));
const a = agents[0]!;
expect(a.top_n_sigma).toBeNull();
expect(a.dry_multiplier).toBeNull();
expect(a.dry_base).toBeNull();
expect(a.dry_allowed_length).toBeNull();
expect(a.dry_penalty_last_n).toBeNull();
});
it('warns (does not error) on out-of-range top_n_sigma / dry_* values', () => {
const warn = vi.spyOn(console, 'warn').mockImplementation(() => {});
const md = withFrontmatter(
[
'top_n_sigma: -1',
'dry_multiplier: -0.5',
'dry_base: -2',
'dry_allowed_length: -3',
'dry_penalty_last_n: -5',
].join('\n'),
);
const { agents, errors } = parseAgentsMd(md);
expect(errors).toHaveLength(0);
expect(agents).toHaveLength(1);
// Mirrors top_k/min_p: out-of-range still stored, with a warning.
expect(warn).toHaveBeenCalled();
const warnings = warn.mock.calls.map((c) => String(c[0])).join('\n');
expect(warnings).toContain('top_n_sigma');
expect(warnings).toContain('dry_multiplier');
expect(warnings).toContain('dry_base');
expect(warnings).toContain('dry_allowed_length');
expect(warnings).toContain('dry_penalty_last_n');
});
it('errors on non-numeric / non-integer sampler values', () => {
const md = withFrontmatter(
['top_n_sigma: high', 'dry_allowed_length: 2.5'].join('\n'),
);
const { errors } = parseAgentsMd(md);
const joined = errors.map((e) => e.reason).join('\n');
expect(joined).toContain('top_n_sigma must be a number');
expect(joined).toContain('dry_allowed_length must be an integer');
});
});

View File

@@ -7,6 +7,8 @@ import {
select, select,
buildPrompt, buildPrompt,
buildHeadPayload, buildHeadPayload,
deriveFilesRead,
buildFilesReadContext,
type CompactionMessage, type CompactionMessage,
} from '../compaction.js'; } from '../compaction.js';
import { SUMMARY_TEMPLATE } from '../compaction-prompt.js'; import { SUMMARY_TEMPLATE } from '../compaction-prompt.js';
@@ -321,3 +323,105 @@ describe('buildHeadPayload reasoning render', () => {
expect(out[1]!.content).not.toContain('<reasoning>'); expect(out[1]!.content).not.toContain('<reasoning>');
}); });
}); });
// ---- buildHeadPayload sentinel stripping (#12) -------------------------------
describe('buildHeadPayload strips all UI sentinels', () => {
it('drops cap_hit, doom_loop, and mistake_recovery system rows', () => {
const out = buildHeadPayload([
mkMsg('user', 'do the thing'),
mkMsg('system', 'budget reached', { metadata: { kind: 'cap_hit' } }),
mkMsg('system', 'looping', { metadata: { kind: 'doom_loop' } }),
mkMsg('system', 'repeated errors', { metadata: { kind: 'mistake_recovery' } }),
mkMsg('assistant', 'answer'),
]);
// Only the user + assistant rows survive; all three sentinels stripped.
expect(out).toHaveLength(2);
expect(out[0]!.role).toBe('user');
expect(out[1]!.role).toBe('assistant');
});
it('keeps a non-sentinel system row (e.g. compact bridge) untouched', () => {
const out = buildHeadPayload([
mkMsg('system', 'legacy compact', { kind: 'compact', metadata: null }),
mkMsg('user', 'q'),
]);
expect(out[0]!.role).toBe('system');
expect(out[0]!.content).toBe('legacy compact');
});
});
// ---- file-provenance ledger (#12, Part B) -----------------------------------
describe('deriveFilesRead', () => {
it('returns [] when the head has no read-tool calls', () => {
expect(deriveFilesRead([mkMsg('user', 'hi'), mkMsg('assistant', 'hello')])).toEqual([]);
});
it('extracts the path arg from view_file / list_dir / grep / find_files', () => {
const head = [
mkMsg('assistant', '', {
tool_calls: [
{ id: 'c1', name: 'view_file', args: { path: 'src/index.ts' } },
{ id: 'c2', name: 'list_dir', args: { path: 'src' } },
{ id: 'c3', name: 'grep', args: { pattern: 'TODO', path: 'apps' } },
{ id: 'c4', name: 'find_files', args: { pattern: '**/*.ts', path: 'lib' } },
],
}),
];
expect(deriveFilesRead(head)).toEqual(['apps', 'lib', 'src', 'src/index.ts']);
});
it('dedupes and sorts paths across multiple assistant turns', () => {
const head = [
mkMsg('assistant', '', { tool_calls: [{ id: 'c1', name: 'view_file', args: { path: 'b.ts' } }] }),
mkMsg('assistant', '', { tool_calls: [{ id: 'c2', name: 'view_file', args: { path: 'a.ts' } }] }),
mkMsg('assistant', '', { tool_calls: [{ id: 'c3', name: 'view_file', args: { path: 'b.ts' } }] }),
];
expect(deriveFilesRead(head)).toEqual(['a.ts', 'b.ts']);
});
it('ignores non-read tools and grep calls without a path arg', () => {
const head = [
mkMsg('assistant', '', {
tool_calls: [
{ id: 'c1', name: 'web_search', args: { query: 'x' } },
{ id: 'c2', name: 'grep', args: { pattern: 'foo' } }, // no path → root, skipped
{ id: 'c3', name: 'view_file', args: { path: 'kept.ts' } },
],
}),
];
expect(deriveFilesRead(head)).toEqual(['kept.ts']);
});
it('ignores read-tool calls on non-assistant rows', () => {
const head = [
mkMsg('user', '', { tool_calls: [{ id: 'c1', name: 'view_file', args: { path: 'nope.ts' } }] }),
];
expect(deriveFilesRead(head)).toEqual([]);
});
});
describe('buildFilesReadContext', () => {
it('returns null when nothing was read (no empty section injected)', () => {
expect(buildFilesReadContext([mkMsg('user', 'hi')])).toBeNull();
});
it('formats a ## Files Read block with sorted bullet paths', () => {
const head = [
mkMsg('assistant', '', {
tool_calls: [
{ id: 'c1', name: 'view_file', args: { path: 'z.ts' } },
{ id: 'c2', name: 'view_file', args: { path: 'a.ts' } },
],
}),
];
expect(buildFilesReadContext(head)).toBe('## Files Read\n- a.ts\n- z.ts');
});
});
describe('SUMMARY_TEMPLATE includes the Files Read section (#12)', () => {
it('declares a ## Files Read section the model must maintain', () => {
expect(SUMMARY_TEMPLATE).toContain('## Files Read');
});
});

View File

@@ -0,0 +1,164 @@
import { describe, it, expect } from 'vitest';
import {
MISTAKE_THRESHOLD,
freshMistakeState,
recordStep,
detectMistakePattern,
MISTAKE_RECOVERY_NOTE,
type FailureKind,
} from '../inference/mistake-tracker.js';
// ---- helpers ----------------------------------------------------------------
// Replays a sequence of outcomes against a fresh state, returning the final
// state so assertions can read .run / .nudges. The caller mimics turn.ts: after
// each recordStep we consult detectMistakePattern and, if it returns 'nudge',
// bump nudges + reset run (the loop's nudge-handling side effect).
function replay(
outcomes: (FailureKind | 'success')[],
{ applyNudge = false }: { applyNudge?: boolean } = {},
) {
const state = freshMistakeState();
const decisions: (ReturnType<typeof detectMistakePattern>)[] = [];
for (const o of outcomes) {
recordStep(state, o);
const decision = detectMistakePattern(state);
decisions.push(decision);
if (applyNudge && decision === 'nudge') {
// Mirror turn.ts's nudge side effect: bump the counter, reset the streak.
state.nudges += 1;
state.run = [];
}
}
return { state, decisions };
}
// ---- fresh state ------------------------------------------------------------
describe('freshMistakeState', () => {
it('starts with an empty run and zero nudges', () => {
const s = freshMistakeState();
expect(s.run).toEqual([]);
expect(s.nudges).toBe(0);
});
});
// ---- below threshold --------------------------------------------------------
describe('detectMistakePattern — below threshold', () => {
it('returns null on a fresh state', () => {
expect(detectMistakePattern(freshMistakeState())).toBeNull();
});
it('returns null after fewer than MISTAKE_THRESHOLD failures', () => {
const { decisions } = replay(['zod_reject', 'exec_error']);
expect(decisions).toEqual([null, null]);
});
});
// ---- success reset ----------------------------------------------------------
describe('recordStep — success resets', () => {
it("'success' clears both the run streak and the nudge counter", () => {
const state = freshMistakeState();
recordStep(state, 'zod_reject');
recordStep(state, 'exec_error');
state.nudges = 2; // simulate prior nudges
recordStep(state, 'success');
expect(state.run).toEqual([]);
expect(state.nudges).toBe(0);
});
it('a success mid-streak prevents the threshold from tripping', () => {
// fail, fail, success, fail, fail → streak never reaches 3.
const { decisions } = replay([
'zod_reject',
'exec_error',
'success',
'tool_not_found',
'permission_denied',
]);
expect(decisions.every((d) => d === null)).toBe(true);
});
});
// ---- 3-streak nudge ---------------------------------------------------------
describe('detectMistakePattern — nudge on 3-streak', () => {
it("returns 'nudge' the first time the streak reaches MISTAKE_THRESHOLD", () => {
const { decisions } = replay(['zod_reject', 'exec_error', 'tool_not_found']);
expect(decisions).toEqual([null, null, 'nudge']);
});
it("fires 'nudge' for a streak of identical kinds too (kind-agnostic)", () => {
const { decisions } = replay(['exec_error', 'exec_error', 'exec_error']);
expect(decisions[2]).toBe('nudge');
});
});
// ---- re-trip escalate -------------------------------------------------------
describe('detectMistakePattern — escalate on re-trip', () => {
it("escalates when the streak re-trips after a nudge with no intervening success", () => {
// 3 fails → nudge (run reset, nudges=1), then 3 more fails → escalate.
const { decisions } = replay(
[
'zod_reject',
'exec_error',
'tool_not_found',
'permission_denied',
'exec_error',
'zod_reject',
],
{ applyNudge: true },
);
expect(decisions[2]).toBe('nudge');
expect(decisions[5]).toBe('escalate');
});
it("does NOT escalate if a success lands between the nudge and the next streak", () => {
const { decisions } = replay(
[
'zod_reject',
'exec_error',
'tool_not_found', // nudge here
'success', // clears nudges back to 0
'exec_error',
'zod_reject',
'tool_not_found', // 3-streak again → nudge, NOT escalate
],
{ applyNudge: true },
);
expect(decisions[2]).toBe('nudge');
expect(decisions[6]).toBe('nudge');
expect(decisions).not.toContain('escalate');
});
});
// ---- mixed kinds ------------------------------------------------------------
describe('detectMistakePattern — mixed failure kinds', () => {
it('counts a streak of all five distinct kinds toward the threshold', () => {
const { state, decisions } = replay([
'zod_reject',
'tool_not_found',
'exec_error',
]);
expect(decisions[2]).toBe('nudge');
expect(state.run).toEqual(['zod_reject', 'tool_not_found', 'exec_error']);
});
});
// ---- contract ---------------------------------------------------------------
describe('MISTAKE_THRESHOLD + MISTAKE_RECOVERY_NOTE', () => {
it('threshold is a positive integer (tests assume 3)', () => {
expect(MISTAKE_THRESHOLD).toBeGreaterThan(0);
expect(Number.isInteger(MISTAKE_THRESHOLD)).toBe(true);
});
it('recovery note is a non-empty model-facing string', () => {
expect(typeof MISTAKE_RECOVERY_NOTE).toBe('string');
expect(MISTAKE_RECOVERY_NOTE.length).toBeGreaterThan(0);
});
});

View File

@@ -88,6 +88,12 @@ interface ParsedFrontmatter {
top_k?: number; top_k?: number;
min_p?: number; min_p?: number;
presence_penalty?: number; presence_penalty?: number;
// v2.6 sampling-streamjson-tokens (#11): llama.cpp sampler extensions.
top_n_sigma?: number;
dry_multiplier?: number;
dry_base?: number;
dry_allowed_length?: number;
dry_penalty_last_n?: number;
tools?: string[]; tools?: string[];
description?: string; description?: string;
model?: string; model?: string;
@@ -178,6 +184,63 @@ function parseFrontmatter(yaml: string): { data: ParsedFrontmatter; errors: stri
} else { } else {
errors.push(`presence_penalty must be a number (got "${valueRaw}")`); errors.push(`presence_penalty must be a number (got "${valueRaw}")`);
} }
} else if (key === 'top_n_sigma') {
// v2.6 #11: llama.cpp top-n-sigma sampler. Float ≥ 0 (typical 0-3).
// Mirrors top_p/min_p: store then warn on out-of-range (non-numeric
// hard-fails the block).
const n = Number(valueRaw);
if (Number.isFinite(n)) {
data.top_n_sigma = n;
if (n < 0) {
console.warn(`agents: top_n_sigma ${n} out of range (≥0), ignoring (falling back to default)`);
}
} else {
errors.push(`top_n_sigma must be a number (got "${valueRaw}")`);
}
} else if (key === 'dry_multiplier') {
// v2.6 #11: DRY repetition-penalty multiplier. Float ≥ 0 (0 disables DRY).
const n = Number(valueRaw);
if (Number.isFinite(n)) {
data.dry_multiplier = n;
if (n < 0) {
console.warn(`agents: dry_multiplier ${n} out of range (≥0), ignoring (falling back to default)`);
}
} else {
errors.push(`dry_multiplier must be a number (got "${valueRaw}")`);
}
} else if (key === 'dry_base') {
// v2.6 #11: DRY penalty growth base. Float ≥ 0.
const n = Number(valueRaw);
if (Number.isFinite(n)) {
data.dry_base = n;
if (n < 0) {
console.warn(`agents: dry_base ${n} out of range (≥0), ignoring (falling back to default)`);
}
} else {
errors.push(`dry_base must be a number (got "${valueRaw}")`);
}
} else if (key === 'dry_allowed_length') {
// v2.6 #11: DRY max sequence length not penalized. Integer ≥ 0.
const n = Number(valueRaw);
if (Number.isInteger(n)) {
data.dry_allowed_length = n;
if (n < 0) {
console.warn(`agents: dry_allowed_length ${n} out of range (≥0), ignoring (falling back to default)`);
}
} else {
errors.push(`dry_allowed_length must be an integer (got "${valueRaw}")`);
}
} else if (key === 'dry_penalty_last_n') {
// v2.6 #11: DRY lookback window. Integer ≥ -1 (-1 = whole context, 0 = off).
const n = Number(valueRaw);
if (Number.isInteger(n)) {
data.dry_penalty_last_n = n;
if (n < -1) {
console.warn(`agents: dry_penalty_last_n ${n} out of range (≥-1), ignoring (falling back to default)`);
}
} else {
errors.push(`dry_penalty_last_n must be an integer (got "${valueRaw}")`);
}
} else if (key === 'tools') { } else if (key === 'tools') {
if (valueRaw === '') { if (valueRaw === '') {
data.tools = []; data.tools = [];
@@ -354,6 +417,11 @@ function parseAgentSection(section: RawSection): Omit<Agent, 'source'> {
top_k: typeof fm.top_k === 'number' ? fm.top_k : null, top_k: typeof fm.top_k === 'number' ? fm.top_k : null,
min_p: typeof fm.min_p === 'number' ? fm.min_p : null, min_p: typeof fm.min_p === 'number' ? fm.min_p : null,
presence_penalty: typeof fm.presence_penalty === 'number' ? fm.presence_penalty : null, presence_penalty: typeof fm.presence_penalty === 'number' ? fm.presence_penalty : null,
top_n_sigma: typeof fm.top_n_sigma === 'number' ? fm.top_n_sigma : null,
dry_multiplier: typeof fm.dry_multiplier === 'number' ? fm.dry_multiplier : null,
dry_base: typeof fm.dry_base === 'number' ? fm.dry_base : null,
dry_allowed_length: typeof fm.dry_allowed_length === 'number' ? fm.dry_allowed_length : null,
dry_penalty_last_n: typeof fm.dry_penalty_last_n === 'number' ? fm.dry_penalty_last_n : null,
tools: filteredTools, tools: filteredTools,
model: typeof fm.model === 'string' && fm.model.length > 0 ? fm.model : null, model: typeof fm.model === 'string' && fm.model.length > 0 ? fm.model : null,
max_tool_calls: typeof fm.max_tool_calls === 'number' ? fm.max_tool_calls : null, max_tool_calls: typeof fm.max_tool_calls === 'number' ? fm.max_tool_calls : null,

View File

@@ -31,10 +31,16 @@ export const SUMMARY_TEMPLATE = `Output exactly the Markdown structure shown ins
## Relevant Files ## Relevant Files
- [file or directory path: why it matters, or "(none)"] - [file or directory path: why it matters, or "(none)"]
## Files Read
- [file or directory path that has been read/searched this session, or "(none)"]
</template> </template>
Rules: Rules:
- Keep every section, even when empty. - Keep every section, even when empty.
- Use terse bullets, not prose paragraphs. - Use terse bullets, not prose paragraphs.
- Preserve exact file paths, commands, error strings, and identifiers when known. - Preserve exact file paths, commands, error strings, and identifiers when known.
- For ## Files Read: this is a cumulative provenance ledger. MERGE the paths
listed in any "## Files Read" block provided below with those already in the
previous summary — never drop a previously-recorded path. Sort and dedupe.
- Do not mention the summary process or that context was compacted.`; - Do not mention the summary process or that context was compacted.`;

View File

@@ -181,6 +181,54 @@ export function select(
}; };
} }
// === file-provenance ledger (#12, Part B) ===
// Read tools whose path/target arg names a file or directory that was read.
// BooChat (apps/server) is read-only — there are no write tools, so the ledger
// only ever has a "Files Read" side (apps/coder can add "Modified" later).
const READ_TOOL_ARG: Record<string, string> = {
view_file: 'path',
list_dir: 'path',
grep: 'path',
find_files: 'path',
};
// Derive a deterministic, deduped, sorted list of file/dir paths read by the
// HEAD messages being summarized. Pure — scans assistant tool_calls only; the
// boundary (which messages are "head") is decided by select() at the call site.
// We derive at compaction time rather than via a live accumulator because
// TurnArgs resets per turn and would miss reads on non-compacting turns; the
// head messages are the authoritative record of what was read in the window
// being summarized. The result propagates forward as summary text across
// compactions (the LLM merges it into ## Files Read), so a path read long ago
// survives even after its originating messages are compacted out.
export function deriveFilesRead(head: CompactionMessage[]): string[] {
const paths = new Set<string>();
for (const m of head) {
if (m.role !== 'assistant') continue;
if (!m.tool_calls) continue;
for (const tc of m.tool_calls) {
const argName = READ_TOOL_ARG[tc.name];
if (!argName) continue;
const raw = (tc.args as Record<string, unknown> | null)?.[argName];
if (typeof raw === 'string' && raw.trim().length > 0) {
paths.add(raw.trim());
}
}
}
return [...paths].sort();
}
// Format the derived paths as a deterministic ## Files Read block for injection
// into buildPrompt's context array. Returns null when nothing was read (so we
// don't inject an empty section). The summarizer merges this into the rolling
// summary's ## Files Read section per the SUMMARY_TEMPLATE instructions.
export function buildFilesReadContext(head: CompactionMessage[]): string | null {
const paths = deriveFilesRead(head);
if (paths.length === 0) return null;
return ['## Files Read', ...paths.map((p) => `- ${p}`)].join('\n');
}
// === prompt assembly === // === prompt assembly ===
// Build the final user message that asks the model to (re)produce the // Build the final user message that asks the model to (re)produce the
@@ -220,15 +268,26 @@ export interface OpenAiMessage {
tool_call_id?: string; tool_call_id?: string;
} }
function isCapHitSentinel(m: CompactionMessage): boolean { // #12: mirror inference/sentinels.ts:isAnySentinel over the CompactionMessage
return m.role === 'system' && m.metadata != null && m.metadata.kind === 'cap_hit'; // shape (which carries metadata as { kind?: string } | null, not the full
// Message type isAnySentinel expects). All UI-only sentinels are stripped from
// the head payload — they never go to the summarizer LLM. Keep the kind list in
// sync with isAnySentinel in sentinels.ts.
const SENTINEL_KINDS = new Set(['cap_hit', 'doom_loop', 'mistake_recovery']);
function isAnySentinel(m: CompactionMessage): boolean {
return (
m.role === 'system' &&
m.metadata != null &&
typeof m.metadata.kind === 'string' &&
SENTINEL_KINDS.has(m.metadata.kind)
);
} }
// v1.13.6: exported for unit-test access (reasoning render coverage). // v1.13.6: exported for unit-test access (reasoning render coverage).
export function buildHeadPayload(head: CompactionMessage[]): OpenAiMessage[] { export function buildHeadPayload(head: CompactionMessage[]): OpenAiMessage[] {
const out: OpenAiMessage[] = []; const out: OpenAiMessage[] = [];
for (const m of head) { for (const m of head) {
if (isCapHitSentinel(m)) continue; if (isAnySentinel(m)) continue;
if (m.role === 'assistant' && (m.status === 'streaming' || m.status === 'cancelled')) continue; if (m.role === 'assistant' && (m.status === 'streaming' || m.status === 'cancelled')) continue;
if (m.kind === 'compact') { if (m.kind === 'compact') {
// Legacy compact row — pass through as system context. The new // Legacy compact row — pass through as system context. The new
@@ -417,7 +476,14 @@ export async function process(input: ProcessInput): Promise<void> {
// user message carrying buildPrompt(previousSummary, []). No system prompt // user message carrying buildPrompt(previousSummary, []). No system prompt
// — matches opencode (`system: []`); the template + anchor are sufficient. // — matches opencode (`system: []`); the template + anchor are sufficient.
const headPayload = buildHeadPayload(sel.head); const headPayload = buildHeadPayload(sel.head);
const finalUser: OpenAiMessage = { role: 'user', content: buildPrompt(previousSummary, []) }; // #12 Part B: derive the file-provenance ledger from the head's read-tool
// calls and inject it as a deterministic ## Files Read context block so the
// summarizer merges it into the rolling summary. Empty → no injection.
const filesReadCtx = buildFilesReadContext(sel.head);
const finalUser: OpenAiMessage = {
role: 'user',
content: buildPrompt(previousSummary, filesReadCtx ? [filesReadCtx] : []),
};
const payload = [...headPayload, finalUser]; const payload = [...headPayload, finalUser];
log.info( log.info(

View File

@@ -19,6 +19,14 @@ export type {
} from './turn.js'; } from './turn.js';
export type { ToolPhaseResult } from './tool-phase.js'; export type { ToolPhaseResult } from './tool-phase.js';
export { detectDoomLoop, DOOM_LOOP_THRESHOLD } from './sentinels.js'; export { detectDoomLoop, DOOM_LOOP_THRESHOLD } from './sentinels.js';
export {
detectMistakePattern,
freshMistakeState,
recordStep,
MISTAKE_THRESHOLD,
MISTAKE_RECOVERY_NOTE,
} from './mistake-tracker.js';
export type { FailureKind, MistakeState } from './mistake-tracker.js';
export { buildMessagesPayload } from './payload.js'; export { buildMessagesPayload } from './payload.js';
export { generateToolUseSummary } from './tool-summaries.js'; export { generateToolUseSummary } from './tool-summaries.js';
export type { ToolInfo } from './tool-summaries.js'; export type { ToolInfo } from './tool-summaries.js';

View File

@@ -0,0 +1,69 @@
// v#12 MistakeTracker: heterogeneous-failure recovery. Complements the
// doom-loop guard (sentinels.ts:detectDoomLoop, which only catches *identical*
// repeats) by catching a run of consecutive tool FAILURES the model isn't
// recovering from — even when each failure is a *different* error. Algorithm
// reimplemented from cline's mistake-counting pattern (NOT vendored).
//
// Pure module — mirrors sentinels.ts:detectDoomLoop. No DB, no I/O. The state
// lives loop-local in TurnArgs (reset per runInference, like recentToolCalls).
// The failure taxonomy already distinguished in tool-phase.ts:executeToolCall.
// 'api_error' is reserved for upstream-model failures surfaced as tool outcomes
// (no current emit site on apps/server, but the union mirrors the design doc
// so a future caller can record it without a type change).
export type FailureKind =
| 'zod_reject'
| 'tool_not_found'
| 'exec_error'
| 'api_error'
| 'permission_denied';
// Smallest streak that doesn't false-positive on a model that retries once
// after a transient error. Matches DOOM_LOOP_THRESHOLD's rationale.
export const MISTAKE_THRESHOLD = 3;
export interface MistakeState {
// The current consecutive-failure streak (any successful tool step clears it).
run: FailureKind[];
// How many recovery nudges have fired without an intervening success. Used to
// escalate (stop the turn) on the second trip rather than nudging forever.
nudges: number;
}
export function freshMistakeState(): MistakeState {
return { run: [], nudges: 0 };
}
// Record one tool step's outcome. A 'success' clears BOTH the streak and the
// nudge counter (the model recovered). A FailureKind pushes onto the streak.
export function recordStep(
state: MistakeState,
outcome: FailureKind | 'success',
): void {
if (outcome === 'success') {
state.run = [];
state.nudges = 0;
return;
}
state.run.push(outcome);
}
// Decide whether to intervene given the current streak. When the streak has
// reached MISTAKE_THRESHOLD: 'nudge' the first time (no nudge fired yet),
// 'escalate' if it trips again while a nudge is already outstanding (no
// intervening success cleared `nudges`). Below threshold → null.
//
// Pure — the caller is responsible for mutating `nudges`/`run` after acting on
// the decision (mirrors how turn.ts consumes detectDoomLoop's result).
export function detectMistakePattern(
state: MistakeState,
): 'nudge' | 'escalate' | null {
if (state.run.length < MISTAKE_THRESHOLD) return null;
return state.nudges === 0 ? 'nudge' : 'escalate';
}
// Model-facing guidance injected (transiently, for the next step only) when a
// nudge fires. Short + declarative for the same reliability reason as the
// cap-hit / doom-loop notes.
export const MISTAKE_RECOVERY_NOTE =
"You've hit several different errors in a row. Stop retrying variations — re-read the tool schemas, verify file paths and arguments exist before calling, and try a fundamentally different approach.";

View File

@@ -86,7 +86,7 @@ export async function runCapHitSummary(
ctx, ctx,
session.model, session.model,
messages, messages,
{ tools: null, temperature: agent?.temperature, top_p: agent?.top_p ?? undefined, top_k: agent?.top_k ?? undefined, min_p: agent?.min_p ?? undefined, presence_penalty: agent?.presence_penalty ?? undefined }, { tools: null, temperature: agent?.temperature, top_p: agent?.top_p ?? undefined, top_k: agent?.top_k ?? undefined, min_p: agent?.min_p ?? undefined, presence_penalty: agent?.presence_penalty ?? undefined, top_n_sigma: agent?.top_n_sigma ?? undefined, dry_multiplier: agent?.dry_multiplier ?? undefined, dry_base: agent?.dry_base ?? undefined, dry_allowed_length: agent?.dry_allowed_length ?? undefined, dry_penalty_last_n: agent?.dry_penalty_last_n ?? undefined },
(delta) => { (delta) => {
accumulated += delta; accumulated += delta;
ctx.publish(sessionId, { ctx.publish(sessionId, {
@@ -346,7 +346,7 @@ export async function runDoomLoopSummary(
ctx, ctx,
session.model, session.model,
messages, messages,
{ tools: null, temperature: agent?.temperature, top_p: agent?.top_p ?? undefined, top_k: agent?.top_k ?? undefined, min_p: agent?.min_p ?? undefined, presence_penalty: agent?.presence_penalty ?? undefined }, { tools: null, temperature: agent?.temperature, top_p: agent?.top_p ?? undefined, top_k: agent?.top_k ?? undefined, min_p: agent?.min_p ?? undefined, presence_penalty: agent?.presence_penalty ?? undefined, top_n_sigma: agent?.top_n_sigma ?? undefined, dry_multiplier: agent?.dry_multiplier ?? undefined, dry_base: agent?.dry_base ?? undefined, dry_allowed_length: agent?.dry_allowed_length ?? undefined, dry_penalty_last_n: agent?.dry_penalty_last_n ?? undefined },
(delta) => { (delta) => {
accumulated += delta; accumulated += delta;
ctx.publish(sessionId, { ctx.publish(sessionId, {
@@ -545,7 +545,7 @@ export async function runStepCapSummary(
ctx, ctx,
session.model, session.model,
messages, messages,
{ tools: null, temperature: agent?.temperature, top_p: agent?.top_p ?? undefined, top_k: agent?.top_k ?? undefined, min_p: agent?.min_p ?? undefined, presence_penalty: agent?.presence_penalty ?? undefined }, { tools: null, temperature: agent?.temperature, top_p: agent?.top_p ?? undefined, top_k: agent?.top_k ?? undefined, min_p: agent?.min_p ?? undefined, presence_penalty: agent?.presence_penalty ?? undefined, top_n_sigma: agent?.top_n_sigma ?? undefined, dry_multiplier: agent?.dry_multiplier ?? undefined, dry_base: agent?.dry_base ?? undefined, dry_allowed_length: agent?.dry_allowed_length ?? undefined, dry_penalty_last_n: agent?.dry_penalty_last_n ?? undefined },
(delta) => { (delta) => {
accumulated += delta; accumulated += delta;
ctx.publish(sessionId, { ctx.publish(sessionId, {
@@ -717,3 +717,57 @@ async function insertDoomLoopSentinel(
metadata, metadata,
}); });
} }
// #12 MistakeTracker: heterogeneous-failure recovery sentinel. Mirrors
// insertDoomLoopSentinel structurally — a role='system', status='complete' row
// firing the standard message_started → delta → message_complete frame
// sequence. Two variants distinguished by `escalated`:
// - escalated:false → a nudge fired; recovery guidance was injected into the
// model's next step and the loop continued. can_continue is true (the turn
// is still live).
// - escalated:true → the nudge didn't break the failure run; the turn was
// stopped (cap-hit-style). can_continue is true so the UI can still offer a
// Continue affordance — a fresh user turn resets the tracker.
export async function insertMistakeRecoverySentinel(
ctx: InferenceContext,
sessionId: string,
chatId: string,
opts: { failureKinds: string[]; count: number; escalated: boolean; canContinue: boolean },
): Promise<void> {
const metadata: MessageMetadata = {
kind: 'mistake_recovery',
failure_kinds: opts.failureKinds,
count: opts.count,
escalated: opts.escalated,
can_continue: opts.canContinue,
};
const content = opts.escalated
? `Repeated different errors persisted after a recovery nudge (${opts.count} in a row). Stopping the tool-call loop.`
: `Hit ${opts.count} different errors in a row. Injected recovery guidance and continuing.`;
const [row] = await ctx.sql<{ id: string }[]>`
INSERT INTO messages (session_id, chat_id, role, content, status, created_at, metadata)
VALUES (${sessionId}, ${chatId}, 'system', ${content}, 'complete', clock_timestamp(), ${ctx.sql.json(metadata as never)})
RETURNING id
`;
// Standard frame sequence — same as cap-hit / doom-loop sentinels.
ctx.publish(sessionId, {
type: 'message_started',
message_id: row!.id,
chat_id: chatId,
role: 'system',
});
ctx.publish(sessionId, {
type: 'delta',
message_id: row!.id,
chat_id: chatId,
content,
});
ctx.publish(sessionId, {
type: 'message_complete',
message_id: row!.id,
chat_id: chatId,
metadata,
});
}

View File

@@ -48,6 +48,18 @@ export function isDoomLoopSentinel(m: Message): boolean {
); );
} }
export function isAnySentinel(m: Message): boolean { // #12: mistake-recovery sentinel. Same UI-only semantics as cap-hit /
return isCapHitSentinel(m) || isDoomLoopSentinel(m); // doom-loop — never sent to the LLM (filtered via the isAnySentinel check
// below, which buildMessagesPayload + buildHeadPayload both consult).
export function isMistakeRecoverySentinel(m: Message): boolean {
return (
m.role === 'system' &&
m.metadata !== null &&
typeof m.metadata === 'object' &&
(m.metadata as { kind?: unknown }).kind === 'mistake_recovery'
);
}
export function isAnySentinel(m: Message): boolean {
return isCapHitSentinel(m) || isDoomLoopSentinel(m) || isMistakeRecoverySentinel(m);
} }

View File

@@ -33,6 +33,39 @@ interface StreamOptions {
top_k?: number | null; top_k?: number | null;
min_p?: number | null; min_p?: number | null;
presence_penalty?: number | null; presence_penalty?: number | null;
// v2.6 sampling-streamjson-tokens (#11): llama.cpp sampler extensions. These
// are NOT standard AI-SDK streamText options and are NOT serialized by the
// openai-compatible provider's standardized-settings path (topK is even
// explicitly dropped with an "unsupported feature: topK" warning). They reach
// llama-server only via providerOptions.openaiCompatible (see buildSamplerProviderOptions).
top_n_sigma?: number | null;
dry_multiplier?: number | null;
dry_base?: number | null;
dry_allowed_length?: number | null;
dry_penalty_last_n?: number | null;
}
// v2.6 #11: build the providerOptions.openaiCompatible extraBody object for the
// llama.cpp sampler extensions. @ai-sdk/openai-compatible (2.0.47) merges every
// non-reserved key under providerOptions.openaiCompatible straight into the
// chat-completion request body (see its getArgs: the Object.fromEntries spread
// filtered against openaiCompatibleLanguageModelChatOptions.shape). This is the
// ONLY working passthrough for these params:
// - top_k / min_p were latently dropped before this: top_k was passed as the
// AI-SDK `topK` setting which the openai-compatible provider rejects as
// unsupported; min_p was never passed to streamText at all.
// - top_n_sigma + the dry_* family have no AI-SDK equivalent.
// Keys use llama-server's snake_case body names so they land verbatim.
function buildSamplerProviderOptions(opts: StreamOptions): Record<string, number> | undefined {
const body: Record<string, number> = {};
if (typeof opts.top_k === 'number') body.top_k = opts.top_k;
if (typeof opts.min_p === 'number') body.min_p = opts.min_p;
if (typeof opts.top_n_sigma === 'number') body.top_n_sigma = opts.top_n_sigma;
if (typeof opts.dry_multiplier === 'number') body.dry_multiplier = opts.dry_multiplier;
if (typeof opts.dry_base === 'number') body.dry_base = opts.dry_base;
if (typeof opts.dry_allowed_length === 'number') body.dry_allowed_length = opts.dry_allowed_length;
if (typeof opts.dry_penalty_last_n === 'number') body.dry_penalty_last_n = opts.dry_penalty_last_n;
return Object.keys(body).length > 0 ? body : undefined;
} }
// v1.13.1-A: convert BooCode's OpenAI-shaped history into AI SDK // v1.13.1-A: convert BooCode's OpenAI-shaped history into AI SDK
@@ -195,6 +228,14 @@ export async function streamCompletion(
return toolCall; return toolCall;
}; };
// v2.6 #11: llama.cpp sampler extensions (top_k, min_p, top_n_sigma, dry_*)
// ride providerOptions.openaiCompatible — they are NOT standardized streamText
// settings. NB: top_k used to be passed below as the AI-SDK `topK` setting;
// the openai-compatible provider dropped it with an "unsupported feature: topK"
// warning and min_p was never wired at all, so both were dead on the wire
// before this. They now go through the same extraBody path as the new params.
const samplerBody = buildSamplerProviderOptions(opts);
const result = streamText({ const result = streamText({
model: upstreamModel(ctx.config, model, agent ?? null), model: upstreamModel(ctx.config, model, agent ?? null),
messages: aiMessages, messages: aiMessages,
@@ -203,8 +244,8 @@ export async function streamCompletion(
: {}), : {}),
...(typeof opts.temperature === 'number' ? { temperature: opts.temperature } : {}), ...(typeof opts.temperature === 'number' ? { temperature: opts.temperature } : {}),
...(typeof opts.top_p === 'number' ? { topP: opts.top_p } : {}), ...(typeof opts.top_p === 'number' ? { topP: opts.top_p } : {}),
...(typeof opts.top_k === 'number' ? { topK: opts.top_k } : {}),
...(typeof opts.presence_penalty === 'number' ? { presencePenalty: opts.presence_penalty } : {}), ...(typeof opts.presence_penalty === 'number' ? { presencePenalty: opts.presence_penalty } : {}),
...(samplerBody ? { providerOptions: { openaiCompatible: samplerBody } } : {}),
abortSignal: signal, abortSignal: signal,
}); });
@@ -398,6 +439,12 @@ export async function executeStreamPhase(
const effectiveTopK = agent?.top_k ?? undefined; const effectiveTopK = agent?.top_k ?? undefined;
const effectiveMinP = agent?.min_p ?? undefined; const effectiveMinP = agent?.min_p ?? undefined;
const effectivePresencePenalty = agent?.presence_penalty ?? undefined; const effectivePresencePenalty = agent?.presence_penalty ?? undefined;
// v2.6 #11: llama.cpp sampler extensions, threaded the same way as top_k/min_p.
const effectiveTopNSigma = agent?.top_n_sigma ?? undefined;
const effectiveDryMultiplier = agent?.dry_multiplier ?? undefined;
const effectiveDryBase = agent?.dry_base ?? undefined;
const effectiveDryAllowedLength = agent?.dry_allowed_length ?? undefined;
const effectiveDryPenaltyLastN = agent?.dry_penalty_last_n ?? undefined;
// v1.12.2: ctx_max lookup is cached after the first hit per model, so this // v1.12.2: ctx_max lookup is cached after the first hit per model, so this
// is a Map probe in steady state. We capture nCtx once at the top of the // is a Map probe in steady state. We capture nCtx once at the top of the
@@ -435,7 +482,19 @@ export async function executeStreamPhase(
ctx, ctx,
session.model, session.model,
messages, messages,
{ tools: effectiveTools, temperature: effectiveTemperature, top_p: effectiveTopP, top_k: effectiveTopK, min_p: effectiveMinP, presence_penalty: effectivePresencePenalty }, {
tools: effectiveTools,
temperature: effectiveTemperature,
top_p: effectiveTopP,
top_k: effectiveTopK,
min_p: effectiveMinP,
presence_penalty: effectivePresencePenalty,
top_n_sigma: effectiveTopNSigma,
dry_multiplier: effectiveDryMultiplier,
dry_base: effectiveDryBase,
dry_allowed_length: effectiveDryAllowedLength,
dry_penalty_last_n: effectiveDryPenaltyLastN,
},
(delta) => { (delta) => {
state.accumulated += delta; state.accumulated += delta;
ctx.publish(sessionId, { ctx.publish(sessionId, {

View File

@@ -17,6 +17,7 @@ import { formatUnknownToolError } from './tool-suggestions.js';
// prompted about paths we couldn't grant anyway (e.g. /etc/passwd). // prompted about paths we couldn't grant anyway (e.g. /etc/passwd).
import { resolveGrantRoot } from '../grant_resolver.js'; import { resolveGrantRoot } from '../grant_resolver.js';
import { stripToolMarkup } from './tool-call-parser.js'; import { stripToolMarkup } from './tool-call-parser.js';
import type { FailureKind } from './mistake-tracker.js';
import type { import type {
InferenceContext, InferenceContext,
StreamResult, StreamResult,
@@ -33,13 +34,18 @@ async function executeToolCall(
toolCall: ToolCall, toolCall: ToolCall,
extraRoots: readonly string[], extraRoots: readonly string[],
toolCtx?: ToolExecCtx, toolCtx?: ToolExecCtx,
): Promise<{ output: unknown; truncated: boolean; error?: string }> { ): Promise<{ output: unknown; truncated: boolean; error?: string; outcome: FailureKind | 'success' }> {
// v#12 MistakeTracker: every return path carries an `outcome` so the turn
// loop can detect a run of heterogeneous failures. The failure taxonomy
// mirrors mistake-tracker.ts:FailureKind. Does NOT alter the existing
// output/truncated/error shape — outcome is purely additive.
const tool = TOOLS_BY_NAME[toolCall.name]; const tool = TOOLS_BY_NAME[toolCall.name];
if (!tool) { if (!tool) {
return { return {
output: null, output: null,
truncated: false, truncated: false,
error: formatUnknownToolError(toolCall.name, Object.keys(TOOLS_BY_NAME)), error: formatUnknownToolError(toolCall.name, Object.keys(TOOLS_BY_NAME)),
outcome: 'tool_not_found',
}; };
} }
const parsed = tool.inputSchema.safeParse(toolCall.args); const parsed = tool.inputSchema.safeParse(toolCall.args);
@@ -64,6 +70,7 @@ async function executeToolCall(
output: null, output: null,
truncated: false, truncated: false,
error: `tool '${toolCall.name}' rejected — ${hint}`, error: `tool '${toolCall.name}' rejected — ${hint}`,
outcome: 'zod_reject',
}; };
} }
try { try {
@@ -72,15 +79,16 @@ async function executeToolCall(
typeof output === 'object' && output !== null && 'truncated' in output typeof output === 'object' && output !== null && 'truncated' in output
? Boolean((output as { truncated: unknown }).truncated) ? Boolean((output as { truncated: unknown }).truncated)
: false; : false;
return { output, truncated }; return { output, truncated, outcome: 'success' };
} catch (err) { } catch (err) {
if (err instanceof PathScopeError) { if (err instanceof PathScopeError) {
return { output: null, truncated: false, error: err.message }; return { output: null, truncated: false, error: err.message, outcome: 'permission_denied' };
} }
return { return {
output: null, output: null,
truncated: false, truncated: false,
error: err instanceof Error ? err.message : String(err), error: err instanceof Error ? err.message : String(err),
outcome: 'exec_error',
}; };
} }
} }
@@ -93,6 +101,12 @@ export interface ToolPhaseResult {
toolCallCount: number; toolCallCount: number;
toolCalls: ToolCall[]; toolCalls: ToolCall[];
nextAssistantId: string | null; nextAssistantId: string | null;
// v#12 MistakeTracker: one outcome per executed tool call, in no particular
// order (filled inside the Promise.all callbacks). The turn loop folds these
// into TurnArgs.mistakeTracker via recordStep. Pause/auto-grant control-flow
// tools record 'success' (they aren't model mistakes); the genuine error
// paths record their FailureKind.
outcomes: (FailureKind | 'success')[];
} }
export async function executeToolPhase( export async function executeToolPhase(
@@ -187,6 +201,10 @@ export async function executeToolPhase(
// for the synthesis input. Race-free under Promise.all because each // for the synthesis input. Race-free under Promise.all because each
// callback pushes its own captured value. // callback pushes its own captured value.
const synthEntries: Array<{ tc: ToolCall; output: unknown; error?: string }> = []; const synthEntries: Array<{ tc: ToolCall; output: unknown; error?: string }> = [];
// v#12 MistakeTracker: collect each tool's outcome. Concurrent pushes under
// Promise.all are safe (each callback appends its own value; order is not
// significant to recordStep which folds them sequentially).
const outcomes: (FailureKind | 'success')[] = [];
await Promise.all( await Promise.all(
toolCalls.map(async (tc) => { toolCalls.map(async (tc) => {
const [toolRow] = await ctx.sql<{ id: string }[]>` const [toolRow] = await ctx.sql<{ id: string }[]>`
@@ -197,6 +215,7 @@ export async function executeToolPhase(
const toolMessageId = toolRow!.id; const toolMessageId = toolRow!.id;
if (tc.name === 'ask_user_input') { if (tc.name === 'ask_user_input') {
pausingForUserInput = true; pausingForUserInput = true;
outcomes.push('success');
const sentinel = { tool_call_id: tc.id, output: null, truncated: false }; const sentinel = { tool_call_id: tc.id, output: null, truncated: false };
// v1.13.20: parts-only. The answer-endpoint UPDATE later // v1.13.20: parts-only. The answer-endpoint UPDATE later
// (messages.ts) will delete and re-insert this part when the user // (messages.ts) will delete and re-insert this part when the user
@@ -227,7 +246,10 @@ export async function executeToolPhase(
); );
if (!resolution.ok) { if (!resolution.ok) {
// Auto-deny without pausing. The model sees the reason on its // Auto-deny without pausing. The model sees the reason on its
// next turn and decides what to do. // next turn and decides what to do. Counts as a permission_denied
// failure for the mistake tracker (the model asked for a path it
// can't have — a recoverable mistake it should learn from).
outcomes.push('permission_denied');
const stored = { const stored = {
tool_call_id: tc.id, tool_call_id: tc.id,
output: `denied: ${resolution.reason}`, output: `denied: ${resolution.reason}`,
@@ -255,6 +277,7 @@ export async function executeToolPhase(
// pause. The grant endpoint re-derives the root at decision time // pause. The grant endpoint re-derives the root at decision time
// (state may have changed in the meantime) so we don't stash it here. // (state may have changed in the meantime) so we don't stash it here.
pausingForUserInput = true; pausingForUserInput = true;
outcomes.push('success');
const sentinel = { tool_call_id: tc.id, output: null, truncated: false }; const sentinel = { tool_call_id: tc.id, output: null, truncated: false };
// v1.13.20: parts-only write. // v1.13.20: parts-only write.
await insertParts( await insertParts(
@@ -267,6 +290,10 @@ export async function executeToolPhase(
return; return;
} }
if (agent && !matchToolGlob(tc.name, agent.tools)) { if (agent && !matchToolGlob(tc.name, agent.tools)) {
// Agent-scope denial — the model called a tool outside its whitelist.
// permission_denied for the mistake tracker (the model should pick a
// tool it's actually allowed to use).
outcomes.push('permission_denied');
const stored = { const stored = {
tool_call_id: tc.id, tool_call_id: tc.id,
output: null, output: null,
@@ -295,6 +322,10 @@ export async function executeToolPhase(
sql: ctx.sql, sql: ctx.sql,
sessionId, sessionId,
}); });
// v#12 MistakeTracker: record the real execution outcome (success or a
// FailureKind). This is the primary signal for heterogeneous-failure
// detection.
outcomes.push(tres.outcome);
if (SYNTHESIS_TOOLS.has(tc.name)) { if (SYNTHESIS_TOOLS.has(tc.name)) {
synthEntries.push({ tc, output: tres.output, ...(tres.error ? { error: tres.error } : {}) }); synthEntries.push({ tc, output: tres.output, ...(tres.error ? { error: tres.error } : {}) });
} }
@@ -340,6 +371,7 @@ export async function executeToolPhase(
toolCallCount: toolCalls.length, toolCallCount: toolCalls.length,
toolCalls, toolCalls,
nextAssistantId: null, nextAssistantId: null,
outcomes,
}; };
} }
@@ -378,6 +410,7 @@ export async function executeToolPhase(
toolCallCount: toolCalls.length, toolCallCount: toolCalls.length,
toolCalls, toolCalls,
nextAssistantId: null, nextAssistantId: null,
outcomes,
}; };
} }
// ran === false → synthesis failed (timeout / model error) → fall through // ran === false → synthesis failed (timeout / model error) → fall through
@@ -397,5 +430,6 @@ export async function executeToolPhase(
toolCallCount: toolCalls.length, toolCallCount: toolCalls.length,
toolCalls, toolCalls,
nextAssistantId: nextAssistant!.id, nextAssistantId: nextAssistant!.id,
outcomes,
}; };
} }

View File

@@ -22,6 +22,13 @@ import { resolveToolBudget } from './budget.js';
import { import {
detectDoomLoop, detectDoomLoop,
} from './sentinels.js'; } from './sentinels.js';
import {
detectMistakePattern,
freshMistakeState,
recordStep,
MISTAKE_RECOVERY_NOTE,
type MistakeState,
} from './mistake-tracker.js';
import { import {
buildMessagesPayload, buildMessagesPayload,
loadContext, loadContext,
@@ -39,6 +46,7 @@ import {
runCapHitSummary, runCapHitSummary,
runDoomLoopSummary, runDoomLoopSummary,
runStepCapSummary, runStepCapSummary,
insertMistakeRecoverySentinel,
} from './sentinel-summaries.js'; } from './sentinel-summaries.js';
// v1.14.0: hard ceiling on the number of stream-and-tool iterations per // v1.14.0: hard ceiling on the number of stream-and-tool iterations per
@@ -144,6 +152,16 @@ export interface TurnArgs {
// boundaries by runInference, same as toolsUsed. Doom-loop check at the // boundaries by runInference, same as toolsUsed. Doom-loop check at the
// top of runAssistantTurn slices the last DOOM_LOOP_THRESHOLD entries. // top of runAssistantTurn slices the last DOOM_LOOP_THRESHOLD entries.
recentToolCalls: ToolCall[]; recentToolCalls: ToolCall[];
// v#12 MistakeTracker: heterogeneous-failure recovery state. Loop-local,
// reset per runInference (user-message boundary) like recentToolCalls. Folds
// tool-phase outcomes via recordStep each iteration; detectMistakePattern
// gates the nudge/escalate decision.
mistakeTracker: MistakeState;
// v#12: transient model-facing recovery note set when a nudge fires. Consumed
// (appended as a role:'system' message + cleared) on the NEXT payload build.
// Never persisted — mirrors how the cap-hit/doom-loop notes live only inside
// the summary call's messages array.
pendingRecoveryNote?: string;
signal: AbortSignal | undefined; signal: AbortSignal | undefined;
} }
@@ -188,6 +206,12 @@ export async function runAssistantTurn(
let toolsUsed = args.toolsUsed; let toolsUsed = args.toolsUsed;
let recentToolCalls = args.recentToolCalls; let recentToolCalls = args.recentToolCalls;
let assistantMessageId = args.assistantMessageId; let assistantMessageId = args.assistantMessageId;
// v#12 MistakeTracker: the tracker state is carried on `args` (mutated in
// place by recordStep). pendingRecoveryNote is a loop-local because it is a
// single-step transient — set when a nudge fires, consumed (injected into the
// next payload) and cleared on the following iteration.
const mistakeTracker = args.mistakeTracker;
let pendingRecoveryNote: string | undefined = args.pendingRecoveryNote;
while (stepNumber < effectiveCap) { while (stepNumber < effectiveCap) {
// ---- doom-loop check (moved from top-of-function) ---- // ---- doom-loop check (moved from top-of-function) ----
@@ -196,7 +220,7 @@ export async function runAssistantTurn(
// Need fresh history for the summary. // Need fresh history for the summary.
const loaded = await loadContext(ctx.sql, sessionId, chatId); const loaded = await loadContext(ctx.sql, sessionId, chatId);
if (loaded) { if (loaded) {
const iterArgs: TurnArgs = { sessionId, chatId, assistantMessageId, toolsUsed, recentToolCalls, signal }; const iterArgs: TurnArgs = { sessionId, chatId, assistantMessageId, toolsUsed, recentToolCalls, mistakeTracker, signal };
await runDoomLoopSummary(ctx, iterArgs, loaded.session, loaded.project, loaded.history, agent, loop); await runDoomLoopSummary(ctx, iterArgs, loaded.session, loaded.project, loaded.history, agent, loop);
} }
break; break;
@@ -206,7 +230,7 @@ export async function runAssistantTurn(
if (toolsUsed >= budget) { if (toolsUsed >= budget) {
const loaded = await loadContext(ctx.sql, sessionId, chatId); const loaded = await loadContext(ctx.sql, sessionId, chatId);
if (loaded) { if (loaded) {
const iterArgs: TurnArgs = { sessionId, chatId, assistantMessageId, toolsUsed, recentToolCalls, signal }; const iterArgs: TurnArgs = { sessionId, chatId, assistantMessageId, toolsUsed, recentToolCalls, mistakeTracker, signal };
await runCapHitSummary(ctx, iterArgs, loaded.session, loaded.project, loaded.history, agent, budget); await runCapHitSummary(ctx, iterArgs, loaded.session, loaded.project, loaded.history, agent, budget);
} }
break; break;
@@ -265,7 +289,16 @@ export async function runAssistantTurn(
} }
} }
const iterArgs: TurnArgs = { sessionId, chatId, assistantMessageId, toolsUsed, recentToolCalls, signal }; // v#12 MistakeTracker: if the prior iteration's nudge fired, append the
// transient recovery note to THIS payload (consumed exactly once, then
// cleared). Never persisted — same lifecycle as the cap-hit/doom-loop
// summary notes, which live only inside the in-memory messages array.
if (pendingRecoveryNote) {
messages.push({ role: 'system', content: pendingRecoveryNote });
pendingRecoveryNote = undefined;
}
const iterArgs: TurnArgs = { sessionId, chatId, assistantMessageId, toolsUsed, recentToolCalls, mistakeTracker, signal };
const state: StreamPhaseState = { accumulated: '', startedAt: null }; const state: StreamPhaseState = { accumulated: '', startedAt: null };
let result: StreamResult; let result: StreamResult;
try { try {
@@ -305,10 +338,78 @@ export async function runAssistantTurn(
recentToolCalls = [...recentToolCalls, ...toolPhaseResult.toolCalls]; recentToolCalls = [...recentToolCalls, ...toolPhaseResult.toolCalls];
stepNumber++; stepNumber++;
// v#12 MistakeTracker: fold this iteration's tool outcomes into the
// tracker, in order. recordStep mutates `mistakeTracker` in place (it is
// the same object referenced by args). A 'success' clears the streak.
for (const o of toolPhaseResult.outcomes) {
recordStep(mistakeTracker, o);
}
if (toolPhaseResult.action !== 'continue') { if (toolPhaseResult.action !== 'continue') {
// 'paused' (user input) or 'synthesis_done' — stop the loop. // 'paused' (user input) or 'synthesis_done' — stop the loop. The turn is
// already ending, so neither a nudge nor an escalate would change the
// control flow; we skip the mistake decision here.
break; break;
} }
// v#12 MistakeTracker: heterogeneous-failure decision. Only evaluated on
// the 'continue' path (the only case where the loop would otherwise
// proceed to another step). Complements the doom-loop check above, which
// only catches *identical* repeats.
const mistake = detectMistakePattern(mistakeTracker);
if (mistake === 'nudge') {
// Soft intervention: inject model-facing recovery guidance into the NEXT
// step's payload, drop a UI sentinel, bump nudges, reset the streak, and
// continue. The note is consumed (and cleared) at the top of the next
// iteration's payload build.
pendingRecoveryNote = MISTAKE_RECOVERY_NOTE;
const failureKinds = [...mistakeTracker.run];
await insertMistakeRecoverySentinel(ctx, sessionId, chatId, {
failureKinds,
count: failureKinds.length,
escalated: false,
canContinue: true,
});
mistakeTracker.nudges += 1;
mistakeTracker.run = [];
ctx.log.info(
{ sessionId, chatId, step: stepNumber, nudges: mistakeTracker.nudges, failureKinds },
'mistake_recovery nudge',
);
assistantMessageId = toolPhaseResult.nextAssistantId!;
continue;
}
if (mistake === 'escalate') {
// The nudge didn't break the failure run — stop the turn (cap-hit-style)
// to avoid burning the whole step budget on heterogeneous failures. The
// next assistant row is still 'streaming'; finalize it as a short note so
// the slot doesn't dangle, then drop the escalate sentinel.
const failureKinds = [...mistakeTracker.run];
assistantMessageId = toolPhaseResult.nextAssistantId!;
await ctx.sql`
UPDATE messages
SET content = '', status = 'complete', finished_at = clock_timestamp()
WHERE id = ${assistantMessageId}
`;
ctx.publish(sessionId, {
type: 'message_complete',
message_id: assistantMessageId,
chat_id: chatId,
});
await insertMistakeRecoverySentinel(ctx, sessionId, chatId, {
failureKinds,
count: failureKinds.length,
escalated: true,
canContinue: true,
});
ctx.publishUser({ type: 'chat_status', chat_id: chatId, status: 'idle', at: new Date().toISOString() });
ctx.log.info(
{ sessionId, chatId, step: stepNumber, failureKinds },
'mistake_recovery escalate — stopping turn',
);
break;
}
// 'continue' — advance to next assistant message. // 'continue' — advance to next assistant message.
assistantMessageId = toolPhaseResult.nextAssistantId!; assistantMessageId = toolPhaseResult.nextAssistantId!;
} }
@@ -320,7 +421,7 @@ export async function runAssistantTurn(
if (stepNumber >= effectiveCap && effectiveCap < Infinity) { if (stepNumber >= effectiveCap && effectiveCap < Infinity) {
const loaded = await loadContext(ctx.sql, sessionId, chatId); const loaded = await loadContext(ctx.sql, sessionId, chatId);
if (loaded) { if (loaded) {
const capArgs: TurnArgs = { sessionId, chatId, assistantMessageId, toolsUsed, recentToolCalls, signal }; const capArgs: TurnArgs = { sessionId, chatId, assistantMessageId, toolsUsed, recentToolCalls, mistakeTracker, signal };
await runStepCapSummary(ctx, capArgs, loaded.session, loaded.project, loaded.history, agent, stepNumber, effectiveCap); await runStepCapSummary(ctx, capArgs, loaded.session, loaded.project, loaded.history, agent, stepNumber, effectiveCap);
} }
} }
@@ -378,12 +479,16 @@ export async function runInference(
// per-call budget. // per-call budget.
// v1.11.6: recentToolCalls also resets — doom-loop detection is scoped // v1.11.6: recentToolCalls also resets — doom-loop detection is scoped
// to a single user-message turn, so a Continue starts with no history. // to a single user-message turn, so a Continue starts with no history.
// v#12 MistakeTracker: fresh per user-message turn, like recentToolCalls.
// Tracks consecutive heterogeneous tool failures across the loop's
// stream-and-tool iterations within this turn.
return runAssistantTurn(ctx, { return runAssistantTurn(ctx, {
sessionId, sessionId,
chatId, chatId,
assistantMessageId, assistantMessageId,
toolsUsed: 0, toolsUsed: 0,
recentToolCalls: [], recentToolCalls: [],
mistakeTracker: freshMistakeState(),
signal, signal,
}); });
} }

View File

@@ -117,6 +117,15 @@ export interface Agent {
top_k: number | null; // null means omit from request body top_k: number | null; // null means omit from request body
min_p: number | null; // null means omit from request body min_p: number | null; // null means omit from request body
presence_penalty: number | null; // null means omit from request body presence_penalty: number | null; // null means omit from request body
// v2.6 sampling-streamjson-tokens (#11): llama.cpp sampler extensions.
// null = omit from request body. top_n_sigma + the DRY repetition family
// help the doom-loop-prone local model. All travel via the same
// providerOptions.openaiCompatible extraBody channel as top_k/min_p.
top_n_sigma: number | null;
dry_multiplier: number | null;
dry_base: number | null;
dry_allowed_length: number | null;
dry_penalty_last_n: number | null;
tools: string[]; // whitelist of tool names; empty = no tools allowed tools: string[]; // whitelist of tool names; empty = no tools allowed
model: string | null; // null means "session.model wins" model: string | null; // null means "session.model wins"
source: AgentSource; source: AgentSource;
@@ -201,6 +210,11 @@ export type ErrorReason =
// cap_hit — system sentinel emitted when tool budget is exhausted // cap_hit — system sentinel emitted when tool budget is exhausted
// doom_loop — system sentinel emitted when the model called the same // doom_loop — system sentinel emitted when the model called the same
// tool with the same args DOOM_LOOP_THRESHOLD times in a row // tool with the same args DOOM_LOOP_THRESHOLD times in a row
// mistake_recovery — system sentinel emitted when a run of consecutive
// *heterogeneous* tool failures is detected (#12). A nudge
// (escalated:false) injects model-facing recovery guidance
// and continues; an escalate (escalated:true) stops the
// turn after the nudge failed to break the failure run.
// error — attached to a failed assistant message so UI can show reason // error — attached to a failed assistant message so UI can show reason
export type MessageMetadata = export type MessageMetadata =
| { | {
@@ -216,6 +230,14 @@ export type MessageMetadata =
args: Record<string, unknown>; args: Record<string, unknown>;
threshold: number; threshold: number;
} }
| {
// PINNED CONTRACT (#12) — mirrored byte-for-byte in apps/web/src/api/types.ts.
kind: 'mistake_recovery';
failure_kinds: string[];
count: number;
escalated: boolean;
can_continue?: boolean;
}
| { | {
kind: 'error'; kind: 'error';
error_reason: ErrorReason; error_reason: ErrorReason;

View File

@@ -39,6 +39,12 @@ const ChatStatusValue = z.enum([
'error', 'error',
]); ]);
// agent-status-normalize (#10): normalized per-(chat,agent) lifecycle status for
// external coding agents (warm-acp / opencode / claude-sdk / pty). Distinct from
// ChatStatusValue (native-inference chat lifecycle) — published by BooCoder's
// dispatcher + permission flow on the per-session channel.
const AgentStatusValue = z.enum(['working', 'blocked', 'idle', 'error']);
const ErrorReasonValue = z.enum([ const ErrorReasonValue = z.enum([
'llm_provider_error', 'llm_provider_error',
'doom_loop', 'doom_loop',
@@ -301,6 +307,21 @@ export const AgentCommandsFrame = z.object({
commands: z.array(AgentCommandShape), commands: z.array(AgentCommandShape),
}); });
// agent-status-normalize (#10): published by BooCoder on the per-session channel
// when an external agent's normalized status changes (turn start/end, permission
// block/unblock). Keyed per (chat_id, agent); the frontend tracks the latest per
// pair and resets on chat switch. `reason` is a free-form discriminator
// (turn_start / turn_complete / failed / crashed / permission_request /
// permission_resolved).
export const AgentStatusUpdatedFrame = z.object({
type: z.literal('agent_status_updated'),
chat_id: Uuid,
agent: z.string().min(1),
status: AgentStatusValue,
reason: z.string().optional(),
at: IsoTimestamp,
});
// ---- discriminated union --------------------------------------------------- // ---- discriminated union ---------------------------------------------------
export const WsFrameSchema = z.discriminatedUnion('type', [ export const WsFrameSchema = z.discriminatedUnion('type', [
@@ -320,6 +341,7 @@ export const WsFrameSchema = z.discriminatedUnion('type', [
PermissionRequestedFrame, PermissionRequestedFrame,
PermissionResolvedFrame, PermissionResolvedFrame,
AgentCommandsFrame, AgentCommandsFrame,
AgentStatusUpdatedFrame,
// per-user // per-user
ChatStatusFrame, ChatStatusFrame,
SessionUpdatedFrame, SessionUpdatedFrame,
@@ -361,6 +383,7 @@ export const KNOWN_FRAME_TYPES: readonly WsFrame['type'][] = [
'permission_requested', 'permission_requested',
'permission_resolved', 'permission_resolved',
'agent_commands', 'agent_commands',
'agent_status_updated',
'chat_status', 'chat_status',
'session_updated', 'session_updated',
'session_renamed', 'session_renamed',

View File

@@ -34,6 +34,30 @@ export interface AgentSessionInfo {
status: string; status: string;
has_session: boolean; has_session: boolean;
last_active_at: string | null; last_active_at: string | null;
// v2.6.8 per-(chat,agent) running token/cost totals (sampling-streamjson-tokens
// #8). input_tokens/output_tokens are BIGINT and may arrive as strings; cost is
// DOUBLE. AgentComposerBar coerces with Number(...) before rendering.
input_tokens: number;
output_tokens: number;
cost: number;
}
// write-edit-robustness #4: a pre-turn worktree snapshot anchored to an
// assistant message. Returned by GET .../checkpoints; drives the per-message
// "Restore to here" affordance in CoderMessageList.
export interface CoderCheckpoint {
id: string;
message_id: string;
created_at: string;
label: string | null;
}
// write-edit-robustness #4: result of POST .../checkpoints/:id/restore.
export interface CoderRestoreResult {
checkpoint_id: string;
messages_deleted: number;
worktree_reset: boolean;
backend_reset: boolean;
} }
export class ApiError extends Error { export class ApiError extends Error {
@@ -407,6 +431,22 @@ export const api = {
...(config?.thinking_option_id ? { thinking_option_id: config.thinking_option_id } : {}), ...(config?.thinking_option_id ? { thinking_option_id: config.thinking_option_id } : {}),
}), }),
}), }),
// write-edit-robustness #4: worktree checkpoints. List which assistant
// messages in a chat have a pre-turn worktree snapshot ("Restore to here"
// is offered only on those). Proxied to boocoder.
getCheckpoints: (sessionId: string, chatId: string) =>
request<{ checkpoints: CoderCheckpoint[] }>(
`/api/coder/sessions/${sessionId}/checkpoints?chat_id=${encodeURIComponent(chatId)}`,
),
// write-edit-robustness #4: reset the worktree to a checkpoint, trim the
// transcript past its anchor message, and reset the agent backend. After it
// returns, the caller refetches messages (+ checkpoints) so the trimmed
// transcript shows.
restoreCheckpoint: (sessionId: string, checkpointId: string) =>
request<CoderRestoreResult>(
`/api/coder/sessions/${sessionId}/checkpoints/${encodeURIComponent(checkpointId)}/restore`,
{ method: 'POST' },
),
// Queue a new-file create from the RightRail browser → BooCoder // Queue a new-file create from the RightRail browser → BooCoder
// pending_changes (operation='create'). Surfaces in the CoderPane DiffPanel // pending_changes (operation='create'). Surfaces in the CoderPane DiffPanel
// for explicit apply. A WriteGuardError comes back as a 422 whose { error } // for explicit apply. A WriteGuardError comes back as a 422 whose { error }

View File

@@ -155,6 +155,9 @@ export type ErrorReason =
// budget + agent name + whether Continue is still allowed. // budget + agent name + whether Continue is still allowed.
// doom_loop — sentinel emitted when the model called the same tool with // doom_loop — sentinel emitted when the model called the same tool with
// the same arguments threshold times in a row. // the same arguments threshold times in a row.
// mistake_recovery — sentinel emitted when the model hit repeated *different*
// errors; non-escalated means recovery guidance was injected and
// the turn continues, escalated means the turn was stopped.
// error — attached to a failed assistant message so the bubble can show // error — attached to a failed assistant message so the bubble can show
// a specific reason on reload (WS error frame is one-shot). // a specific reason on reload (WS error frame is one-shot).
export type MessageMetadata = export type MessageMetadata =
@@ -171,6 +174,13 @@ export type MessageMetadata =
args: Record<string, unknown>; args: Record<string, unknown>;
threshold: number; threshold: number;
} }
| {
kind: 'mistake_recovery';
failure_kinds: string[];
count: number;
escalated: boolean;
can_continue?: boolean;
}
| { | {
kind: 'error'; kind: 'error';
error_reason: ErrorReason; error_reason: ErrorReason;
@@ -586,4 +596,16 @@ export type WsFrame =
| { type: 'compacted'; session_id: string; chat_id: string; summary_message_id: string } | { type: 'compacted'; session_id: string; chat_id: string; summary_message_id: string }
// v1.8.2: `reason` discriminates structured failures (the UI prefers it // v1.8.2: `reason` discriminates structured failures (the UI prefers it
// over `error` text when present). // over `error` text when present).
| { type: 'error'; message_id?: string; chat_id?: string; error: string; reason?: ErrorReason }; | { type: 'error'; message_id?: string; chat_id?: string; error: string; reason?: ErrorReason }
// agent-status-normalize (#10): BooCoder publishes a normalized per-(chat,agent)
// lifecycle status for external coding agents on the per-session channel. The
// CoderPane tracks the latest status per (chat_id, agent) and resets on chat
// switch; AgentComposerBar renders the dot (distinct from the WS-liveness dot).
| {
type: 'agent_status_updated';
chat_id: string;
agent: string;
status: 'working' | 'blocked' | 'idle' | 'error';
reason?: string;
at: string;
};

View File

@@ -39,6 +39,12 @@ const ChatStatusValue = z.enum([
'error', 'error',
]); ]);
// agent-status-normalize (#10): normalized per-(chat,agent) lifecycle status for
// external coding agents (warm-acp / opencode / claude-sdk / pty). Distinct from
// ChatStatusValue (native-inference chat lifecycle) — published by BooCoder's
// dispatcher + permission flow on the per-session channel.
const AgentStatusValue = z.enum(['working', 'blocked', 'idle', 'error']);
const ErrorReasonValue = z.enum([ const ErrorReasonValue = z.enum([
'llm_provider_error', 'llm_provider_error',
'doom_loop', 'doom_loop',
@@ -301,6 +307,21 @@ export const AgentCommandsFrame = z.object({
commands: z.array(AgentCommandShape), commands: z.array(AgentCommandShape),
}); });
// agent-status-normalize (#10): published by BooCoder on the per-session channel
// when an external agent's normalized status changes (turn start/end, permission
// block/unblock). Keyed per (chat_id, agent); the frontend tracks the latest per
// pair and resets on chat switch. `reason` is a free-form discriminator
// (turn_start / turn_complete / failed / crashed / permission_request /
// permission_resolved).
export const AgentStatusUpdatedFrame = z.object({
type: z.literal('agent_status_updated'),
chat_id: Uuid,
agent: z.string().min(1),
status: AgentStatusValue,
reason: z.string().optional(),
at: IsoTimestamp,
});
// ---- discriminated union --------------------------------------------------- // ---- discriminated union ---------------------------------------------------
export const WsFrameSchema = z.discriminatedUnion('type', [ export const WsFrameSchema = z.discriminatedUnion('type', [
@@ -320,6 +341,7 @@ export const WsFrameSchema = z.discriminatedUnion('type', [
PermissionRequestedFrame, PermissionRequestedFrame,
PermissionResolvedFrame, PermissionResolvedFrame,
AgentCommandsFrame, AgentCommandsFrame,
AgentStatusUpdatedFrame,
// per-user // per-user
ChatStatusFrame, ChatStatusFrame,
SessionUpdatedFrame, SessionUpdatedFrame,
@@ -361,6 +383,7 @@ export const KNOWN_FRAME_TYPES: readonly WsFrame['type'][] = [
'permission_requested', 'permission_requested',
'permission_resolved', 'permission_resolved',
'agent_commands', 'agent_commands',
'agent_status_updated',
'chat_status', 'chat_status',
'session_updated', 'session_updated',
'session_renamed', 'session_renamed',

View File

@@ -3,6 +3,7 @@ import { Check, ChevronDown, RefreshCw, Loader2, Shield, Brain, Bot } from 'luci
import { api } from '@/api/client'; import { api } from '@/api/client';
import type { AgentSessionConfig, ProviderSnapshotEntry, AgentCommand } from '@/api/types'; import type { AgentSessionConfig, ProviderSnapshotEntry, AgentCommand } from '@/api/types';
import { useProviderSnapshot, refreshProviderSnapshot } from '@/hooks/useProviderSnapshot'; import { useProviderSnapshot, refreshProviderSnapshot } from '@/hooks/useProviderSnapshot';
import type { AgentStatusEntry } from '@/hooks/useAgentStatus';
import { providerIcon } from '@/components/coder/providerIcons'; import { providerIcon } from '@/components/coder/providerIcons';
import { useAgentSessions } from '@/hooks/useAgentSessions'; import { useAgentSessions } from '@/hooks/useAgentSessions';
import { import {
@@ -183,6 +184,19 @@ interface Props {
// True once the chat has at least one prior turn — gates the chip so it stays // True once the chat has at least one prior turn — gates the chip so it stays
// hidden on a brand-new chat. Defaults to false (no chip). // hidden on a brand-new chat. Defaults to false (no chip).
hasPriorTurn?: boolean; hasPriorTurn?: boolean;
// #10: normalized status (working|blocked|idle|error) for the active external
// agent in this chat, or null for native boocode / before any frame. Renders
// a status dot DISTINCT from the WS-liveness `connected` dot. Undefined for
// non-coder callers — no dot.
agentStatus?: AgentStatusEntry | null;
}
// Condensed token count: 950 → "950", 12_400 → "12.4K", 3_200_000 → "3.2M".
// Sub-1000 stays exact; thousands/millions get one decimal, trailing .0 trimmed.
function abbrevTokens(n: number): string {
if (!Number.isFinite(n) || n < 1000) return String(Math.max(0, Math.round(n)));
if (n < 1_000_000) return `${(n / 1000).toFixed(1).replace(/\.0$/, '')}K`;
return `${(n / 1_000_000).toFixed(1).replace(/\.0$/, '')}M`;
} }
// Relative-time formatter for the resumed-chip title (e.g. "3m ago"). // Relative-time formatter for the resumed-chip title (e.g. "3m ago").
@@ -202,7 +216,42 @@ function relativeTime(iso: string | null): string {
return `${day}d ago`; return `${day}d ago`;
} }
export function AgentComposerBar({ projectPath, value, onChange, onProviderCommandsChange, connected, sessionId, hasPriorTurn }: Props) { // #10: normalized external-agent status dot. Mirrors StatusDot's visual
// language but on the four normalized buckets (working|blocked|idle|error),
// and is DISTINCT from the WS-liveness `connected` dot beside it:
// working — emerald spinning ring (subtle motion, like chat streaming)
// blocked — amber dot (matches the permission/blocked state colour)
// idle — gray dot
// error — red dot
function AgentStatusDot({ entry, agent }: { entry: AgentStatusEntry; agent: string }) {
const title =
`${agent}: ${entry.status}` + (entry.reason ? `${entry.reason}` : '');
if (entry.status === 'working') {
return (
<span
aria-label={`Agent status: working${entry.reason ? `${entry.reason}` : ''}`}
title={title}
className="inline-block w-3 h-3 rounded-full border-2 border-emerald-500 border-t-transparent animate-spin shrink-0"
/>
);
}
const bg =
entry.status === 'blocked' ? 'bg-amber-500'
: entry.status === 'error' ? 'bg-destructive'
: 'bg-muted-foreground/40';
return (
<span
aria-label={`Agent status: ${entry.status}${entry.reason ? `${entry.reason}` : ''}`}
title={title}
className={cn('inline-block w-1.5 h-1.5 rounded-full shrink-0', bg)}
/>
);
}
export function AgentComposerBar({ projectPath, value, onChange, onProviderCommandsChange, connected, sessionId, hasPriorTurn, agentStatus }: Props) {
const allEntries = useProviderSnapshot(projectPath); const allEntries = useProviderSnapshot(projectPath);
// 5.5 — the composer picker only offers ENABLED providers that are ready (or // 5.5 — the composer picker only offers ENABLED providers that are ready (or
// still loading). Disabled (enabled:false) and unavailable/error providers are // still loading). Disabled (enabled:false) and unavailable/error providers are
@@ -353,6 +402,21 @@ export function AgentComposerBar({ projectPath, value, onChange, onProviderComma
: { label: 'new session', title: `${value.provider} starts a fresh session this turn` } : { label: 'new session', title: `${value.provider} starts a fresh session this turn` }
: null; : null;
// sampling-streamjson-tokens #8: condensed per-(chat,agent) token/cost readout
// beside the session chip. Coerce — input/output are BIGINT (string over wire).
// Hidden when no session row or all totals are zero (e.g. native boocode, which
// holds no agent_sessions row, or a provider that hasn't run yet).
const usageReadout = (() => {
if (!sessionChip || !sessionRow) return null;
const inTok = Number(sessionRow.input_tokens) || 0;
const outTok = Number(sessionRow.output_tokens) || 0;
const cost = Number(sessionRow.cost) || 0;
if (inTok <= 0 && outTok <= 0 && cost <= 0) return null;
const parts = [`${abbrevTokens(inTok)} in`, `${abbrevTokens(outTok)} out`];
if (cost > 0) parts.push(`$${cost.toFixed(2)}`);
return parts.join(' · ');
})();
return ( return (
<div className="flex flex-wrap items-center gap-1 px-2 py-1 border-b border-border bg-muted/20 shrink-0"> <div className="flex flex-wrap items-center gap-1 px-2 py-1 border-b border-border bg-muted/20 shrink-0">
<CompactPicker <CompactPicker
@@ -374,6 +438,14 @@ export function AgentComposerBar({ projectPath, value, onChange, onProviderComma
{sessionChip.label} {sessionChip.label}
</span> </span>
)} )}
{usageReadout && (
<span
className="text-[10px] text-muted-foreground tabular-nums whitespace-nowrap shrink-0"
title="Tokens in · out · cost for this agent session"
>
{usageReadout}
</span>
)}
<CompactPicker <CompactPicker
label="Mode" label="Mode"
value={value.modeId ?? ''} value={value.modeId ?? ''}
@@ -403,6 +475,11 @@ export function AgentComposerBar({ projectPath, value, onChange, onProviderComma
{/* Status dot + refresh as one right-aligned unit so the refresh button {/* Status dot + refresh as one right-aligned unit so the refresh button
stays on the top line instead of wrapping past the edge-pinned dot. */} stays on the top line instead of wrapping past the edge-pinned dot. */}
<div className="ml-auto flex items-center gap-1 shrink-0"> <div className="ml-auto flex items-center gap-1 shrink-0">
{/* #10: normalized agent status — only for an external agent with a
live status frame. Distinct from the WS-liveness dot that follows. */}
{agentStatus && value.provider !== 'boocode' && (
<AgentStatusDot entry={agentStatus} agent={value.provider} />
)}
{connected !== undefined && ( {connected !== undefined && (
<span <span
className={cn('inline-block w-1.5 h-1.5 rounded-full shrink-0', connected ? 'bg-green-500' : 'bg-red-500')} className={cn('inline-block w-1.5 h-1.5 rounded-full shrink-0', connected ? 'bg-green-500' : 'bg-red-500')}

View File

@@ -1,6 +1,6 @@
import { useEffect, useState } from 'react'; import { useEffect, useState } from 'react';
import type { ReactNode } from 'react'; import type { ReactNode } from 'react';
import { ChevronDown, ChevronRight, Copy, RefreshCw, Check, Share2, RotateCw, GitFork, Trash2, Brain } from 'lucide-react'; import { ChevronDown, ChevronRight, Copy, RefreshCw, Check, Share2, RotateCw, GitFork, Trash2, Brain, History, AlertCircle } from 'lucide-react';
import { toast } from 'sonner'; import { toast } from 'sonner';
import type { Chat, ErrorReason, Message } from '@/api/types'; import type { Chat, ErrorReason, Message } from '@/api/types';
import { api } from '@/api/client'; import { api } from '@/api/client';
@@ -110,6 +110,10 @@ export interface MessageActions {
onResend?: (chatId: string, content: string) => Promise<void>; onResend?: (chatId: string, content: string) => Promise<void>;
onFork?: (chatId: string, messageId: string) => Promise<void>; onFork?: (chatId: string, messageId: string) => Promise<void>;
onDelete?: (chatId: string, messageId: string) => Promise<void>; onDelete?: (chatId: string, messageId: string) => Promise<void>;
// write-edit-robustness #4 (BooCoder only): reset the worktree to this
// message's pre-turn checkpoint and trim the transcript past it. BooChat
// passes no such callback → the "Restore to here" control never renders.
onRestoreCheckpoint?: (chatId: string, messageId: string) => Promise<void>;
} }
interface Props { interface Props {
@@ -119,6 +123,17 @@ interface Props {
actions?: MessageActions; actions?: MessageActions;
/** Hide actions that don't apply (fork, delete). */ /** Hide actions that don't apply (fork, delete). */
hideActions?: ('fork' | 'delete')[]; hideActions?: ('fork' | 'delete')[];
/**
* write-edit-robustness #4: this assistant message has a worktree checkpoint
* → render "Restore to here" (only when `actions.onRestoreCheckpoint` is also
* provided). CoderMessageList sets this from the checkpoint set.
*/
hasCheckpoint?: boolean;
/**
* write-edit-robustness #4: suppress the restore control during an active
* turn (mirrors composer gating). Defaults to enabled.
*/
restoreDisabled?: boolean;
} }
function StatsLine({ message }: { message: Message }) { function StatsLine({ message }: { message: Message }) {
@@ -155,16 +170,22 @@ function ActionRow({
message, message,
actions, actions,
hiddenSet, hiddenSet,
hasCheckpoint = false,
restoreDisabled = false,
}: { }: {
message: Message; message: Message;
actions?: MessageActions; actions?: MessageActions;
hiddenSet: Set<string>; hiddenSet: Set<string>;
hasCheckpoint?: boolean;
restoreDisabled?: boolean;
}) { }) {
const [justCopied, setJustCopied] = useState(false); const [justCopied, setJustCopied] = useState(false);
const [regenerating, setRegenerating] = useState(false); const [regenerating, setRegenerating] = useState(false);
const [forking, setForking] = useState(false); const [forking, setForking] = useState(false);
const [deleteOpen, setDeleteOpen] = useState(false); const [deleteOpen, setDeleteOpen] = useState(false);
const [deleting, setDeleting] = useState(false); const [deleting, setDeleting] = useState(false);
const [restoreOpen, setRestoreOpen] = useState(false);
const [restoring, setRestoring] = useState(false);
async function copy() { async function copy() {
try { try {
@@ -240,12 +261,33 @@ function ActionRow({
} }
} }
async function confirmRestore() {
if (restoring || !actions?.onRestoreCheckpoint) return;
setRestoring(true);
try {
await actions.onRestoreCheckpoint(message.chat_id, message.id);
setRestoreOpen(false);
} catch (err) {
toast.error(err instanceof Error ? err.message : 'restore failed');
} finally {
setRestoring(false);
}
}
const isAssistant = message.role === 'assistant'; const isAssistant = message.role === 'assistant';
const isUser = message.role === 'user'; const isUser = message.role === 'user';
const canRegen = isAssistant && message.status !== 'streaming'; const canRegen = isAssistant && message.status !== 'streaming';
const canResend = isUser && message.status === 'complete' && !!message.content?.trim(); const canResend = isUser && message.status === 'complete' && !!message.content?.trim();
const canFork = message.status === 'complete'; const canFork = message.status === 'complete';
const canDelete = message.status !== 'streaming'; const canDelete = message.status !== 'streaming';
// write-edit-robustness #4: show "Restore to here" only for a completed
// assistant message that has a checkpoint AND when the coder wired the
// callback. Disabled (but visible) during an active turn.
const canRestore =
isAssistant &&
hasCheckpoint &&
message.status === 'complete' &&
!!actions?.onRestoreCheckpoint;
return ( return (
<> <>
@@ -306,6 +348,18 @@ function ActionRow({
<Trash2 className="size-3" /> <Trash2 className="size-3" />
</button> </button>
)} )}
{canRestore && (
<button
type="button"
onClick={() => setRestoreOpen(true)}
disabled={restoreDisabled || restoring}
className="inline-flex items-center justify-center size-6 rounded text-muted-foreground hover:bg-muted hover:text-foreground disabled:opacity-40 disabled:cursor-not-allowed max-md:min-h-[44px] max-md:min-w-[44px]"
aria-label="Restore to here"
title="Restore worktree to this point"
>
<History className="size-3" />
</button>
)}
</div> </div>
<Dialog <Dialog
open={deleteOpen} open={deleteOpen}
@@ -338,6 +392,39 @@ function ActionRow({
</DialogFooter> </DialogFooter>
</DialogContent> </DialogContent>
</Dialog> </Dialog>
<Dialog
open={restoreOpen}
onOpenChange={(open) => {
if (!restoring) setRestoreOpen(open);
}}
>
<DialogContent>
<DialogHeader>
<DialogTitle>Restore to this point?</DialogTitle>
<DialogDescription>
This resets the worktree to before this turn, removes every later
message in this chat, and resets the agent's session. This cannot
be undone.
</DialogDescription>
</DialogHeader>
<DialogFooter>
<Button
variant="outline"
onClick={() => setRestoreOpen(false)}
disabled={restoring}
>
Cancel
</Button>
<Button
variant="destructive"
onClick={() => void confirmRestore()}
disabled={restoring}
>
{restoring ? 'Restoring' : 'Restore'}
</Button>
</DialogFooter>
</DialogContent>
</Dialog>
</> </>
); );
} }
@@ -550,7 +637,85 @@ function ReasoningBlock({ text, streaming }: { text: string; streaming: boolean
); );
} }
export function MessageBubble({ message, sessionChats, capHitInfo, actions, hideActions }: Props) { // feature #12: mistake-recovery sentinel. Inserted by the backend as a
// role='system', metadata.kind='mistake_recovery' row when the model hit
// repeated *different* errors (distinct from doom_loop, which is the same
// call repeated). Visual treatment mirrors CapHitSentinel / DoomLoopSentinel
// (amber card + alert icon). Non-escalated → recovery guidance was injected
// and the turn continues. Escalated → the turn was stopped; if can_continue
// is set, offer the same Continue affordance as the cap-hit sentinel.
// Loose `!= null` guards per the CLAUDE.md coder-message note (coder rows pass
// metadata as undefined, not null).
function MistakeRecoverySentinel({ message }: { message: Message }) {
const meta = message.metadata;
const isMistakeRecovery =
meta != null && typeof meta === 'object' && meta.kind === 'mistake_recovery';
const failureKinds = isMistakeRecovery ? meta.failure_kinds : [];
const escalated = isMistakeRecovery ? meta.escalated : false;
const canContinue = isMistakeRecovery ? meta.can_continue === true : false;
const [continuing, setContinuing] = useState(false);
async function handleContinue() {
if (continuing || !canContinue) return;
setContinuing(true);
try {
await api.chats.continue(message.chat_id, message.id);
} catch (err) {
toast.error(err instanceof Error ? err.message : 'continue failed');
} finally {
setContinuing(false);
}
}
const kindsLabel =
Array.isArray(failureKinds) && failureKinds.length > 0
? failureKinds.join(', ')
: null;
return (
<div className="rounded-md border border-amber-500/40 bg-amber-500/10 text-sm">
<div className="px-3 py-2 flex items-start gap-2">
<AlertCircle className="size-4 text-amber-500 shrink-0 mt-0.5" />
<div className="flex-1 min-w-0 space-y-1">
<div className="text-xs font-medium text-amber-700 dark:text-amber-300">
{escalated ? 'Repeated errors — turn stopped' : 'Recovering from repeated errors'}
</div>
<div className="text-xs text-muted-foreground">
{escalated
? 'Repeated errors persisted — stopped the turn.'
: kindsLabel
? `Hit repeated different errors (${kindsLabel}) — recovery guidance injected, continuing.`
: 'Hit repeated different errors — recovery guidance injected, continuing.'}
</div>
{escalated && canContinue && (
<div className="pt-1">
<Button
type="button"
size="sm"
variant="outline"
onClick={() => void handleContinue()}
disabled={continuing}
>
{continuing ? 'Continuing…' : 'Continue'}
</Button>
</div>
)}
</div>
</div>
</div>
);
}
export function MessageBubble({
message,
sessionChats,
capHitInfo,
actions,
hideActions,
hasCheckpoint,
restoreDisabled,
}: Props) {
const hiddenSet = new Set(hideActions ?? []); const hiddenSet = new Set(hideActions ?? []);
// v1.11: anchored rolling summary row. Checked BEFORE the kind==='compact' // v1.11: anchored rolling summary row. Checked BEFORE the kind==='compact'
// branch because summary=true never coexists with kind='compact' (new // branch because summary=true never coexists with kind='compact' (new
@@ -586,6 +751,13 @@ export function MessageBubble({ message, sessionChats, capHitInfo, actions, hide
return <DoomLoopSentinel message={message} />; return <DoomLoopSentinel message={message} />;
} }
// feature #12: mistake-recovery sentinel. Non-escalated rows narrate that
// recovery guidance was injected mid-turn; escalated rows report the turn
// was stopped and (when can_continue) offer the cap-hit-style Continue.
if (message.role === 'system' && message.metadata?.kind === 'mistake_recovery') {
return <MistakeRecoverySentinel message={message} />;
}
// v1.8.2: tool messages and assistant tool_calls are now rendered by // v1.8.2: tool messages and assistant tool_calls are now rendered by
// MessageList via ToolCallLine / ToolCallGroup. Tool-role messages reach // MessageList via ToolCallLine / ToolCallGroup. Tool-role messages reach
// this point only if MessageList didn't consume them (shouldn't happen, // this point only if MessageList didn't consume them (shouldn't happen,
@@ -652,7 +824,15 @@ export function MessageBubble({ message, sessionChats, capHitInfo, actions, hide
</div> </div>
)} )}
{!isStreaming && <StatsLine message={message} />} {!isStreaming && <StatsLine message={message} />}
{!isStreaming && hasContent && <ActionRow message={message} actions={actions} hiddenSet={hiddenSet} />} {!isStreaming && hasContent && (
<ActionRow
message={message}
actions={actions}
hiddenSet={hiddenSet}
hasCheckpoint={hasCheckpoint}
restoreDisabled={restoreDisabled}
/>
)}
</div> </div>
); );
} }

View File

@@ -147,11 +147,24 @@ interface Props {
chatId?: string; chatId?: string;
footer?: ReactNode; footer?: ReactNode;
actions?: MessageActions; actions?: MessageActions;
// write-edit-robustness #4: assistant message ids that have a worktree
// checkpoint. The "Restore to here" control renders only on these.
checkpointMessageIds?: Set<string>;
// write-edit-robustness #4: suppress restore during an active turn (mirrors
// composer gating in CoderPane).
restoreDisabled?: boolean;
} }
const CODER_HIDDEN_ACTIONS: ('fork' | 'delete')[] = ['fork']; const CODER_HIDDEN_ACTIONS: ('fork' | 'delete')[] = ['fork'];
export function CoderMessageList({ messages, chatId, footer, actions }: Props) { export function CoderMessageList({
messages,
chatId,
footer,
actions,
checkpointMessageIds,
restoreDisabled,
}: Props) {
const endRef = useRef<HTMLDivElement>(null); const endRef = useRef<HTMLDivElement>(null);
const scrollRef = useRef<HTMLDivElement>(null); const scrollRef = useRef<HTMLDivElement>(null);
const isNearBottomRef = useRef(true); const isNearBottomRef = useRef(true);
@@ -189,6 +202,8 @@ export function CoderMessageList({ messages, chatId, footer, actions }: Props) {
message={item.message as unknown as Message} message={item.message as unknown as Message}
actions={actions} actions={actions}
hideActions={CODER_HIDDEN_ACTIONS} hideActions={CODER_HIDDEN_ACTIONS}
hasCheckpoint={checkpointMessageIds?.has(item.message.id) ?? false}
restoreDisabled={restoreDisabled}
/> />
); );
} }

View File

@@ -18,6 +18,7 @@ import { mergeWireToolCall } from '@/lib/coder-tools';
import { CoderMessageList, type CoderTimelineWire } from '@/components/panes/CoderMessageList'; import { CoderMessageList, type CoderTimelineWire } from '@/components/panes/CoderMessageList';
import { providerIcon, providerLabel } from '@/components/coder/providerIcons'; import { providerIcon, providerLabel } from '@/components/coder/providerIcons';
import { refreshAgentSessions } from '@/hooks/useAgentSessions'; import { refreshAgentSessions } from '@/hooks/useAgentSessions';
import { useAgentStatus, type AgentStatus, type AgentStatusEntry } from '@/hooks/useAgentStatus';
import { cn } from '@/lib/utils'; import { cn } from '@/lib/utils';
// --------------------------------------------------------------------------- // ---------------------------------------------------------------------------
@@ -80,6 +81,14 @@ interface WsHandlers {
onAssistantComplete?: () => void; onAssistantComplete?: () => void;
onAgentCommands?: (taskId: string, commands: AgentCommand[]) => void; onAgentCommands?: (taskId: string, commands: AgentCommand[]) => void;
onConnectedChange?: (connected: boolean) => void; onConnectedChange?: (connected: boolean) => void;
// #10: normalized external-agent status (working|blocked|idle|error) for the
// (chat,agent) carried on the frame. CoderPane records it in a live map and
// feeds the active agent's status to AgentComposerBar's status dot.
onAgentStatus?: (
chatId: string,
agent: string,
entry: AgentStatusEntry,
) => void;
} }
type RawCoderMessage = { type RawCoderMessage = {
@@ -326,6 +335,19 @@ function useCoderMessages(sessionId: string, chatId: string | undefined, handler
description: c.description, description: c.description,
})), })),
); );
} else if (frame.type === 'agent_status_updated') {
// #10: { chat_id, agent, status, reason?, at }. The chat_id guard
// above already dropped cross-chat frames; record per (chat,agent).
const chatId = (frame.chat_id ?? scopedChatId) as string | undefined;
const agent = frame.agent as string | undefined;
const status = frame.status as AgentStatus | undefined;
if (chatId && agent && status) {
handlersRef.current.onAgentStatus?.(chatId, agent, {
status,
...(frame.reason ? { reason: frame.reason as string } : {}),
at: (frame.at as string) ?? new Date().toISOString(),
});
}
} }
} catch { } catch {
// ignore unparseable frames // ignore unparseable frames
@@ -381,6 +403,29 @@ function usePendingChanges(sessionId: string) {
return { changes, loading, refresh, approve, reject }; return { changes, loading, refresh, approve, reject };
} }
// write-edit-robustness #4: which assistant messages in this chat have a
// worktree checkpoint, so CoderMessageList can offer "Restore to here" only on
// those. Refetched on message_complete (same trigger as pending changes) and
// after a successful restore.
function useCheckpoints(sessionId: string, chatId: string | undefined) {
const [messageIds, setMessageIds] = useState<Set<string>>(() => new Set());
const refresh = useCallback(() => {
if (!chatId) {
setMessageIds(new Set());
return Promise.resolve();
}
return api.coder
.getCheckpoints(sessionId, chatId)
.then((res) => setMessageIds(new Set(res.checkpoints.map((c) => c.message_id))))
.catch(() => {/* boocoder may be down / endpoint not ready */});
}, [sessionId, chatId]);
useEffect(() => { void refresh(); }, [refresh]);
return { checkpointMessageIds: messageIds, refreshCheckpoints: refresh };
}
// --------------------------------------------------------------------------- // ---------------------------------------------------------------------------
// Sub-components // Sub-components
// --------------------------------------------------------------------------- // ---------------------------------------------------------------------------
@@ -619,6 +664,8 @@ export function CoderPane({
return groups; return groups;
}, [agentCommands, skillItems, agentConfig.provider]); }, [agentCommands, skillItems, agentConfig.provider]);
// #10: live normalized status per (chat,agent), reset on chat switch below.
const agentStatus = useAgentStatus();
const { messages, setMessages, connected, loadMessages } = useCoderMessages(sessionId, chatId, { const { messages, setMessages, connected, loadMessages } = useCoderMessages(sessionId, chatId, {
onConnectedChange, onConnectedChange,
onPermissionRequested: (prompt) => { onPermissionRequested: (prompt) => {
@@ -638,8 +685,23 @@ export function CoderPane({
onAgentCommands: (_taskId, commands) => { onAgentCommands: (_taskId, commands) => {
setLiveTaskCommands(commands); setLiveTaskCommands(commands);
}, },
onAgentStatus: agentStatus.record,
}); });
// Clear any stale status for the previous chat when the pane switches chats so
// a lingering working/blocked dot never carries into the next conversation.
useEffect(() => {
return () => agentStatus.reset(chatId);
}, [chatId, agentStatus]);
// The active agent's normalized status for this chat. null for native boocode
// (no external status published) or before any frame arrives — gates the dot.
const currentAgentStatus: AgentStatusEntry | null =
agentConfig.provider && agentConfig.provider !== 'boocode'
? agentStatus.get(chatId, agentConfig.provider)
: null;
const { changes, loading, refresh, approve, reject } = usePendingChanges(sessionId); const { changes, loading, refresh, approve, reject } = usePendingChanges(sessionId);
const { checkpointMessageIds, refreshCheckpoints } = useCheckpoints(sessionId, chatId);
const [input, setInput] = useState(''); const [input, setInput] = useState('');
const [sending, setSending] = useState(false); const [sending, setSending] = useState(false);
const [queue, setQueue] = useState<string[]>([]); const [queue, setQueue] = useState<string[]>([]);
@@ -652,15 +714,18 @@ export function CoderPane({
// Refresh pending changes (and agent-session state for the §9b chip) when a // Refresh pending changes (and agent-session state for the §9b chip) when a
// message_complete arrives — same trigger usePendingChanges already uses. // message_complete arrives — same trigger usePendingChanges already uses.
// write-edit-robustness #4: also refetch checkpoints so a new turn's snapshot
// surfaces its "Restore to here" control.
useEffect(() => { useEffect(() => {
const lastAssistant = [...messages].reverse().find( const lastAssistant = [...messages].reverse().find(
(m): m is CoderMessage => m.role === 'assistant', (m): m is CoderMessage => m.role === 'assistant',
); );
if (lastAssistant?.status === 'complete') { if (lastAssistant?.status === 'complete') {
refresh(); refresh();
void refreshCheckpoints();
void refreshAgentSessions(sessionId); void refreshAgentSessions(sessionId);
} }
}, [messages, refresh, sessionId]); }, [messages, refresh, refreshCheckpoints, sessionId]);
// The §9b chip only shows once the chat has ≥1 prior turn (a completed // The §9b chip only shows once the chat has ≥1 prior turn (a completed
// assistant message). Hidden on a brand-new chat. // assistant message). Hidden on a brand-new chat.
@@ -867,6 +932,38 @@ export function CoderPane({
} }
}, [activeTaskId]); }, [activeTaskId]);
// write-edit-robustness #4: reset the worktree to a message's checkpoint and
// trim the transcript past it. The confirm lives in MessageBubble's ActionRow
// (plain Cancel/Restore). The restore route is keyed by checkpoint id, so we
// resolve message→checkpoint via a fresh GET (cheap, and avoids a stale id if
// the set changed). On success, refetch messages so the trimmed transcript
// shows, plus checkpoints (later ones were deleted server-side) and pending
// changes (the worktree was reset).
const handleRestoreCheckpoint = useCallback(async (_chatId: string, messageId: string) => {
if (!chatId || generating) return;
let checkpointId: string | undefined;
try {
const res = await api.coder.getCheckpoints(sessionId, chatId);
checkpointId = res.checkpoints.find((c) => c.message_id === messageId)?.id;
} catch (err) {
toast.error(err instanceof Error ? err.message : 'failed to load checkpoint');
return;
}
if (!checkpointId) {
toast.error('No checkpoint found for this message');
return;
}
try {
await api.coder.restoreCheckpoint(sessionId, checkpointId);
await loadMessages();
await refreshCheckpoints();
refresh();
toast.success('Restored to checkpoint');
} catch (err) {
toast.error(err instanceof Error ? err.message : 'restore failed');
}
}, [chatId, generating, sessionId, loadMessages, refreshCheckpoints, refresh]);
const handleChatInputSlash = useCallback(async (skillName: string, userMessage: string) => { const handleChatInputSlash = useCallback(async (skillName: string, userMessage: string) => {
if (!chatId) return; if (!chatId) return;
// Only BooCoder skills route here; an agent's own commands (not skills) fall // Only BooCoder skills route here; an agent's own commands (not skills) fall
@@ -909,6 +1006,7 @@ export function CoderPane({
connected={connected} connected={connected}
sessionId={sessionId} sessionId={sessionId}
hasPriorTurn={hasPriorTurn} hasPriorTurn={hasPriorTurn}
agentStatus={currentAgentStatus}
/> />
{/* Chat area — BooChat-style timeline (text + tool runs as siblings) */} {/* Chat area — BooChat-style timeline (text + tool runs as siblings) */}
<div className="flex-1 min-h-0 flex flex-col"> <div className="flex-1 min-h-0 flex flex-col">
@@ -921,8 +1019,11 @@ export function CoderPane({
<CoderMessageList <CoderMessageList
messages={messages as CoderTimelineWire[]} messages={messages as CoderTimelineWire[]}
chatId={chatId} chatId={chatId}
checkpointMessageIds={checkpointMessageIds}
restoreDisabled={generating}
actions={{ actions={{
onResend: async (_chatId, content) => { await sendOneMessage(content); }, onResend: async (_chatId, content) => { await sendOneMessage(content); },
onRestoreCheckpoint: handleRestoreCheckpoint,
}} }}
footer={ footer={
activeTaskId && !permissionPrompt && sending === false ? ( activeTaskId && !permissionPrompt && sending === false ? (

View File

@@ -0,0 +1,62 @@
import { useCallback, useMemo, useState } from 'react';
// Normalized external-agent status (#10). Consumed from the
// `agent_status_updated` WS frame the coder backend publishes:
// { type: 'agent_status_updated'; chat_id; agent; status; reason?; at }
// BooCoder collapses ~30 vendor lifecycle events into these four buckets:
// working — turn in flight
// blocked — waiting on a permission / approval
// idle — clean completion
// error — crash / failure
export type AgentStatus = 'working' | 'blocked' | 'idle' | 'error';
export interface AgentStatusEntry {
status: AgentStatus;
reason?: string;
at: string;
}
const key = (chatId: string, agent: string): string => `${chatId}:${agent}`;
// Per-(chat,agent) live status map. The dot reflects the latest frame for the
// active agent in the current chat; entries are reset when the chat switches so
// a stale "working"/"blocked" from a previous chat never leaks into the next.
export function useAgentStatus() {
const [map, setMap] = useState<Record<string, AgentStatusEntry>>({});
const record = useCallback(
(chatId: string, agent: string, entry: AgentStatusEntry) => {
setMap((prev) => ({ ...prev, [key(chatId, agent)]: entry }));
},
[],
);
// Drop every entry for a chat (called on chat switch). No-op when nothing
// matches so it's safe to call unconditionally from an effect.
const reset = useCallback((chatId: string | undefined) => {
setMap((prev) => {
if (!chatId) return prev;
const prefix = `${chatId}:`;
let changed = false;
const next: Record<string, AgentStatusEntry> = {};
for (const [k, v] of Object.entries(prev)) {
if (k.startsWith(prefix)) {
changed = true;
continue;
}
next[k] = v;
}
return changed ? next : prev;
});
}, []);
const get = useCallback(
(chatId: string | undefined, agent: string | undefined): AgentStatusEntry | null => {
if (!chatId || !agent) return null;
return map[key(chatId, agent)] ?? null;
},
[map],
);
return useMemo(() => ({ record, reset, get }), [record, reset, get]);
}

View File

@@ -189,6 +189,12 @@ function applyFrame(state: State, frame: WsFrame): State {
// duplicating async work inside a synchronous reducer. // duplicating async work inside a synchronous reducer.
return state; return state;
} }
case 'agent_status_updated': {
// agent-status-normalize (#10): coder-only frame consumed by CoderPane's
// own WS handler, not BooChat's native message reducer. No-op here to keep
// TS exhaustiveness satisfied (native sessions never emit it).
return state;
}
} }
} }

View File

@@ -6,6 +6,10 @@ Operating rules for every agent in this registry. Full procedures live in the `c
**Worktrees** — Isolate work in a worktree when it is parallel to in-progress work, risky/experimental, a hotfix interrupting other work, or splits into independent units — just create when clear, propose in one line when ambiguous, skip quick/small single-stream work. Branch from a stable base (default branch); worktrees persist (never auto-remove or auto-merge); they isolate code state, not runtime (ports/DBs/services still collide). Full heuristic: invoke `using-worktrees`. **Worktrees** — Isolate work in a worktree when it is parallel to in-progress work, risky/experimental, a hotfix interrupting other work, or splits into independent units — just create when clear, propose in one line when ambiguous, skip quick/small single-stream work. Branch from a stable base (default branch); worktrees persist (never auto-remove or auto-merge); they isolate code state, not runtime (ports/DBs/services still collide). Full heuristic: invoke `using-worktrees`.
**Sampling knobs** — Each `## Name` frontmatter block accepts these per-agent sampler fields, threaded into the llama-swap chat-completion request: `temperature`, `top_p`, `top_k`, `min_p`, `presence_penalty`, and (v2.6) `top_n_sigma`, `dry_multiplier`, `dry_base`, `dry_allowed_length`, `dry_penalty_last_n`. The `top_n_sigma` + `dry_*` repetition family curb the doom-loop-prone local model. Omit a field to leave it at the server default. Example: `top_n_sigma: 1.0`, `dry_multiplier: 0.8`, `dry_base: 1.75`, `dry_allowed_length: 2`, `dry_penalty_last_n: -1` (-1 = whole context).
**Reasoning budget** — To cap a reasoning model's thinking tokens, pass `--reasoning-budget` through `llama_extra_args` (already permitted by the deny-list validator; routes the agent to llama-sidecar). Example frontmatter line: `llama_extra_args: ["--reasoning-budget", "2048"]`. This is a sidecar process flag, not a chat-completion body param — distinct from the sampling knobs above.
## Code Reviewer ## Code Reviewer
--- ---
temperature: 0.6 temperature: 0.6

View File

@@ -0,0 +1,61 @@
# Normalized external-agent status (#10, scoped)
**Status:** in progress (started 2026-06-01)
**Source:** `boocode_code_review_v2.md` §1 #10, §5j (superset, Elastic License 2.0 — PATTERN-ONLY,
clean-room; `/opt/forks/superset/.../map-event-type.ts`, `notify-hook.template.sh`, `agent-setup/*`).
**Decision (Sam, 2026-06-01):** scoped status-publish now; config-injection notify-hook as a follow-on.
## Why (corrected premise)
BooCoder already *observes* agent lifecycle (warm-acp/opencode/SDK backends know active/idle/crashed;
the permission-waiter knows blocked) but never **publishes a normalized per-`(chat,agent)` status** to the
UI — so blocked-on-permission is invisible and crash/idle aren't pushed proactively. The `AgentComposerBar`
dot only shows WS liveness. This batch publishes the status BooCoder already knows; the heavier
config-injection notify-hook (for out-of-band signals) is the documented follow-on.
## State model (clean-room from superset's `mapEventType`)
Superset collapses ~30 vendor event names → 3 signals: **Start** (working), **PermissionRequest**
(blocked), **Stop** (done). BooCoder adds idle (after done) + error (crash/fail). Normalized status:
`working | blocked | idle | error`.
## Pinned frame contract (server + web, byte-identical, parity-tested)
```ts
{ type: 'agent_status_updated', chat_id: Uuid, agent: string,
status: 'working' | 'blocked' | 'idle' | 'error', reason?: string, at: IsoTimestamp }
```
Added to `apps/server/src/types/ws-frames.ts` AND `apps/web/src/api/ws-frames.ts` (the `ws-frames` parity
test), plus the web `WsFrame` union in `apps/web/src/api/types.ts`. Published via the coder's
`broker.publishFrame` (validated against the server `WsFrameSchema`).
## Clean-room normalize helper (built now, reused by the injection follow-on)
`apps/coder/src/services/normalize-agent-status.ts`:
`normalizeAgentEvent(raw: string): 'working' | 'blocked' | 'done' | null` — a clean-room reimplementation
of the vendor-event-name → bucket mapping (the event names are facts about each agent's hooks:
`SessionStart`/`UserPromptSubmit`/`PostToolUse`→working; `PreToolUse`/`Notification`/`PermissionRequest`/
`exec_approval_request`→blocked; `Stop`/`session_end`/`task_complete`→done). The scoped publish points use
BooCoder's own already-normalized turn boundaries; this helper exists so the config-injection follow-on
(which receives raw vendor event names POSTed from agent hooks) reuses it. Unit-tested.
## Publish points (BooCoder's existing observation — no per-backend change)
- Dispatcher (`dispatcher.ts`) turn boundaries, for every external-agent path (warm-acp/opencode/sdk/pty):
`working` at turn start, `idle` on clean completion, `error` on failure.
- Permission-waiter (`permission-waiter.ts` / the `setPermissionHooks` publish in `index.ts`): `blocked`
when a permission is requested, back to `working` when resolved.
A small `publishAgentStatus(broker, chatId, agent, status, reason?)` helper centralizes the frame.
## Frontend
- `CoderPane.tsx` tracks the latest `agent_status_updated` per `(chat, agent)` (a small live map; reset on
chat switch).
- `AgentComposerBar.tsx` renders a normalized status dot beside the existing session chip (reuse the
`StatusDot` visual language: working=spinner/green, blocked=amber, idle=gray, error=red), distinct from
the WS-liveness `connected` dot.
## Follow-on (documented, not built): config-injection notify-hook
Clean-room re-derive superset's `agent-setup`: inject a notify hook into each agent's native config
(claude `~/.claude/settings.json`, opencode plugin, codex/gemini templates) that POSTs
`{agent, chat_id, eventType}` to a new `POST /api/coder/agent-status` endpoint, which runs
`normalizeAgentEvent` → publishes the SAME `agent_status_updated` frame. Reuses everything this batch
builds. Catches out-of-band signals BooCoder's dispatch can't see.
## Verify
- `pnpm -C apps/coder test` (+ normalize-agent-status tests) + `pnpm -C apps/server test` (ws-frames parity)
- `pnpm -C apps/server build && pnpm -C apps/coder build`; `npx tsc -p apps/web/tsconfig.app.json --noEmit`

View File

@@ -0,0 +1,68 @@
# Claude Agent SDK backend + clean-room PostgresSessionStore (#9)
**Status:** in progress (started 2026-06-01)
**Source:** `boocode_code_review_v2.md` §1 #9, §5h/§5i (happy + SDK `.d.ts`). Decision §6.2: lean SDK.
**SDK:** `@anthropic-ai/claude-agent-sdk@0.3.159` (installed, Commercial Terms — runtime dep OK, code
reference-only; the store is **clean-room** from the real interface, not vendored).
Replace BooCoder's one-shot PTY claude dispatch with a warm, resumable Claude-SDK backend. Two parts:
the clean-room session store (fully testable here) and the backend + wiring (live pump needs a host
smoke against real `claude`).
## Ground-truth SDK API (from the installed `sdk.d.ts`)
- `query({ prompt: string | AsyncIterable<SDKUserMessage>, options?: Options }): Query` where
`Query extends AsyncGenerator<SDKMessage, void>`.
- `Options`: `sessionStore?: SessionStore`, `resume?: string`, `model?`, `cwd?`,
`pathToClaudeCodeExecutable?`, `canUseTool?`, `permissionMode?`, `env?`, `allowedTools?`.
- `SessionStore = { append(key, entries): Promise<void>; load(key): Promise<SessionStoreEntry[]|null>;
listSessions?(projectKey): Promise<{sessionId,mtime}[]>; delete?(key): Promise<void>;
listSubkeys?({projectKey,sessionId}): Promise<string[]> }`.
- `SessionKey = { projectKey: string; sessionId: string; subpath?: string }` (undefined subpath = main
transcript; empty string invalid — store maps undefined→'' internally).
- `SessionStoreEntry = { type: string; uuid?: string; timestamp?: string; [k]: unknown }` (opaque JSONL).
- Messages: `SDKSystemMessage{subtype:'init'}` carries `session_id` (+ model/tools); `SDKResultMessage`
(success/error) ends a turn with `result`, `usage`, `total_cost_usd`; `SDKPartialAssistantMessage` /
`SDKAssistantMessage` carry text/thinking/tool blocks.
## Part 1 — Clean-room PostgresSessionStore (testable now)
- Schema (`apps/coder/src/schema.sql`): a generic append-only entry table
`claude_session_entries(id BIGSERIAL PK, project_key TEXT, session_id TEXT, subpath TEXT DEFAULT '',
entry JSONB, created_at TIMESTAMPTZ DEFAULT clock_timestamp())` + index `(project_key, session_id,
subpath, id)`. (The store is generic per the SDK's key; the chat↔session ownership lives in
`agent_sessions`, not here.)
- `apps/coder/src/services/backends/claude-session-store.ts`: `PostgresSessionStore` implementing the
real `SessionStore` type over `Sql`. `append` = ordered multi-INSERT (id = order); `load` = SELECT
ORDER BY id → array or null; `listSessions` = group main-transcript rows, mtime = max(created_at) ms;
`delete` = scoped delete (subpath given → that subpath; omitted → whole session); `listSubkeys` =
DISTINCT non-'' subpaths. Pure SQL, no SDK import needed beyond the `SessionStore` type.
- Tests `__tests__/claude-session-store.test.ts` (DB-opt-in, mirror `checkpoints.test.ts`): append→load
round-trip + order, null on unseen key, subpath isolation (main vs subagent), listSessions mtime,
delete scoping, listSubkeys.
## Part 2 — ClaudeSdkBackend + wiring (live pump needs host smoke)
- `agent_sessions.backend` CHECK adds `'claude_sdk'`.
- `apps/coder/src/services/backends/claude-sdk.ts`: a `ClaudeSdkBackend` implementing `AgentBackend`
(mirror `warm-acp.ts`/`opencode-server.ts`). `ensureSession` resolves the resume id from
`agent_sessions(chat_id,'claude').agent_session_id`; `prompt` drives one persistent `query()` in
streaming-input mode (a pushable `AsyncIterable<SDKUserMessage>` fed per turn) with
`{ sessionStore, resume, model, cwd: worktreePath, pathToClaudeCodeExecutable: installPath }`,
reads the `AsyncGenerator<SDKMessage>` until `result`, captures `session_id` from the `init` message
and persists it to `agent_sessions`. A pure `mapSdkMessage(msg): AgentEvent[]` (unit-tested) maps
partial/assistant/tool/thinking → the existing `AgentEvent` union; `result.usage`/`total_cost_usd`
accumulate onto `agent_sessions` (like opencode U.6). `isBusy`/`closeSession`/crash mirror the ACP
backend.
- Routing: add `claude` to the warm path (`warm-acp-routing.ts` or a sibling `shouldUseClaudeSdk`),
with the existing PTY `runExternalAgent` kept as the **fallback** (session-less creators + if the SDK
backend fails to start). Provider registry: claude stays selectable; transport reflects the SDK path.
- Frames + persistence identical to the warm-ACP path (`persistExternalAgentTurn`, broker frames).
## Verify
- Part 1: `pnpm -C apps/coder test` + DB-opt-in store tests against dev postgres; build clean.
- Part 2: `pnpm -C apps/coder build` + `npx tsc -p apps/coder/tsconfig.json --noEmit` (typechecks
against the REAL SDK types) + pure-mapper unit tests. **Live pump + resume across turns: host smoke
against real `claude` (auth required) — cannot run from the dev container.**
## Open flags
- SDK peer-deps want `zod@^4`; workspace is `zod@3.25.76` (installed with a warning) — watch at runtime.
- `pathToClaudeCodeExecutable` from `available_agents.install_path`; the SDK spawns the same `claude`
binary the PTY path uses. ANTHROPIC auth/env must reach the child (host concern).

View File

@@ -0,0 +1,70 @@
# MistakeTracker + file-provenance ledger (#12)
**Status:** in progress (started 2026-06-01)
**Source:** `boocode_code_review_v2.md` §1 #12, §5e (cline — algorithm-reimplemented, not vendored).
Two native-inference (apps/server) hardening features. One cohesive backend change (they share
`TurnArgs` + the tool-phase observation point) + a small frontend sentinel render.
## Part A — MistakeTracker (heterogeneous-failure recovery)
Complements the doom-loop guard (`sentinels.ts:detectDoomLoop`, which only catches *identical*
repeats) by catching a run of consecutive tool **failures** the model isn't recovering from.
- New pure `apps/server/src/services/inference/mistake-tracker.ts` (mirrors `detectDoomLoop`):
- `FailureKind = 'zod_reject' | 'tool_not_found' | 'exec_error' | 'api_error' | 'permission_denied'`
(all already distinguished in `tool-phase.ts:executeToolCall`).
- `MISTAKE_THRESHOLD = 3`.
- State `{ run: FailureKind[]; nudges: number }``run` is the current consecutive-failure streak,
reset on ANY successful tool step; `nudges` counts recovery injections not yet cleared by a success.
- `recordStep(state, outcome)` where outcome is a failure kind or `'success'`.
- `detectMistakePattern(state): 'nudge' | 'escalate' | null``run.length >= 3``'nudge'` the first
time (`nudges === 0`), `'escalate'` if it trips again while `nudges >= 1` (no intervening success).
- Lives in `TurnArgs` (loop-local, reset per `runInference`, like `recentToolCalls`).
- Integration in `turn.ts` loop: after each tool phase, `recordStep` per tool outcome; then
`detectMistakePattern`:
- `'nudge'` (decision: soft + escalate): append a transient **model-facing** recovery-guidance system
message to the NEXT turn's payload (re-read schemas, verify paths exist before acting, try a
different approach — not retry variations), insert a `mistake_recovery` UI sentinel
(`escalated:false`), bump `nudges`, reset `run`. Loop continues.
- `'escalate'`: stop the turn (break), insert a `mistake_recovery` sentinel (`escalated:true`,
`can_continue:true`, cap-hit-style), finalize. Prevents heterogeneous failures from burning the
whole step budget.
## Part B — File-provenance ledger (Read-only)
- Accumulate file paths read by `view_file`/`grep`/`find_files`/`list_dir` into `TurnArgs.filesRead:
Set<string>` (recorded at the tool-phase, like the failure outcomes).
- On compaction (`compaction.ts:buildPrompt`), inject a deterministic, sorted `## Files Read` list into
the summary prompt context so the summarizer merges it into the rolling summary — **no new
table/column**; it propagates as summary text across compactions. `compaction-prompt.ts`'s
`SUMMARY_TEMPLATE` already has a `## Relevant Files` section to extend/merge with.
- BooChat is **read-only** (no write tools on apps/server) → "Files Modified" is N/A here; only
"Files Read". (The apps/coder write side can add "Modified" later.)
## Sentinel contract (pinned — backend + frontend must match)
New sentinel kind on `MessageMetadata` in BOTH `apps/server/src/types/api.ts` AND
`apps/web/src/api/types.ts`:
```
{ kind: 'mistake_recovery'; failure_kinds: string[]; count: number; escalated: boolean; can_continue?: boolean }
```
- `role='system'`, `status='complete'`, stripped from the LLM payload via `isAnySentinel` in
`payload.ts` (UI-only) and `compaction.ts:buildHeadPayload`.
- Frontend render branch in `apps/web/src/components/MessageBubble.tsx`: `escalated:false` →
"Hit repeated different errors — recovery guidance injected, continuing." `escalated:true` →
"Repeated errors persisted — stopped the turn." (mirror the doom-loop/cap-hit branches).
## Decisions (2026-06-01)
- MistakeTracker intervention: **soft nudge + escalate**.
- **UI sentinel** for recovery (`mistake_recovery`).
## Files (backend, one agent) / (frontend, one agent)
- Backend: `mistake-tracker.ts` (new), `turn.ts`, `tool-phase.ts`, `sentinels.ts`,
`sentinel-summaries.ts`, `payload.ts`, `compaction.ts`, `compaction-prompt.ts`, `types/api.ts` +
tests (`mistake-tracker.test.ts`, ledger/compaction assertions).
- Frontend: `apps/web/src/api/types.ts` (MessageMetadata arm) + `MessageBubble.tsx` (render branch).
MUST NOT touch Sam's WIP web files.
## Verify
- `pnpm -C apps/server test`; `pnpm -C apps/server build`; `npx tsc -p apps/web/tsconfig.app.json --noEmit`

View File

@@ -0,0 +1,45 @@
# Small wins — sampling knobs + PTY stream-json + token UI
**Status:** in progress (started 2026-06-01)
**Source:** `boocode_code_review_v2.md` §1 #11 / #7 / #8 (config-adopt + qwen-code §5g + opencode §3 #4).
Three independent BooCode improvements, disjoint subsystems (apps/server / apps/coder / apps/web).
## #11 — New sampling knobs (apps/server)
Per-agent `top_n_sigma` + the `dry_*` repetition family help the doom-loop-prone local model.
Today the Agent type threads `temperature/top_p/top_k/min_p/presence_penalty` into the inference
request (`stream-phase.ts:396438`). Add `top_n_sigma`, `dry_multiplier`, `dry_base`,
`dry_allowed_length`, `dry_penalty_last_n` as first-class Agent fields (`types/api.ts`), parse them in
`agents.ts:parseFrontmatter` (same bounded per-field numeric pattern + out-of-range warn), and thread
them into the request body **via the same mechanism `top_k`/`min_p` already use** (the agent must
confirm whether that's an AI-SDK `providerOptions`/`extraBody` passthrough — these are llama.cpp
extensions, not standard OpenAI fields — and ride it; surface it if `top_k`/`min_p` turn out to be
silently dropped today). `--reasoning-budget` is a llama-server CLI flag already permitted by the
deny-list validator, so it works via `llama_extra_args: ["--reasoning-budget","N"]` now — document it
in `data/AGENTS.md`. apps/server only.
## #7 — Live PTY stream-json NDJSON parsing (apps/coder)
qwen/claude PTY dispatch slices stdout opaque (`dispatcher.ts` PTY path; qwen already runs
`--output-format stream-json`). Add a parser for the Claude-Code-compatible NDJSON
(`system`/`assistant`/`result`/`stream_event``content_block_delta` text/thinking/tool deltas +
`usage` + `session_id`) that maps to the existing `AgentEvent` union (`agent-backend.ts`). **Live
incremental** (decision 2026-06-01): line-buffer the PTY stdout `data` events, parse each complete
NDJSON line as it arrives, and emit broker frames live (text/reasoning/tool) like the ACP/opencode
paths — plus accumulate for `persistExternalAgentTurn`. claude gets `--output-format stream-json` too.
One parser serves both (same schema). apps/coder only (`pty-dispatch.ts`, `dispatcher.ts`, new
`stream-json-parser.ts` + test).
## #8 — Surface opencode token usage (apps/coder route + apps/web)
`agent_sessions.input_tokens/output_tokens/cost` are accumulated (v2.6.8) but the
`GET /api/sessions/:id/agent-sessions` SELECT + the `AgentSessionInfo` type drop them. Add the 3
columns to both, render condensed beside the existing session chip in `AgentComposerBar`
(ChatThroughput styling: `tabular-nums`, muted, e.g. "12.4K in / 3.2K out / $0.25"). MUST NOT touch
Sam's uncommitted WIP (`ChatTabBar`, `SessionLandingPage`, `Workspace`, `useWorkspacePanes`,
`PaneHeaderActions`).
## Decisions (2026-06-01)
- #7 surfacing: **live incremental** streaming (not parse-at-end).
## Verify
- `pnpm -C apps/server test` (+ new agent-parse tests); `pnpm -C apps/coder test` (+ new parser tests)
- `pnpm -C apps/server build && pnpm -C apps/coder build`; `npx tsc -p apps/web/tsconfig.app.json --noEmit`

View File

@@ -0,0 +1,101 @@
# Write/edit robustness — fuzzy patch applier + worktree checkpoints
**Status:** in progress (started 2026-06-01)
**Source:** `boocode_code_review_v2.md` §1 #3 + #4, §5b/§5d5e (cline, Apache-2.0 — algorithm clean-reimplemented, not vendored).
Two independent BooCoder hardening features for local quantized models.
## #3 — Fuzzy patch applier
**Problem:** `applyOne`'s edit case (`apps/coder/src/services/pending_changes.ts:124`) does exact
`content.includes(oldStr)` → throw, then `content.replace(oldStr, newStr)` (first occurrence).
`rewindOne` (line 206) is the same. Local models (qwen3.6) drift `old_string` by whitespace/
indentation/unicode (curly quotes, en/em-dash, nbsp), so a valid edit fails at apply with
"old_string not found" and is lost.
**Design:** new pure module `apps/coder/src/services/fuzzy-match.ts`:
`locateMatch(content: string, needle: string): { kind: 'exact'|'fuzzy'; start: number; end: number }
| { kind: 'ambiguous'; count: number } | { kind: 'not_found' }`. Match ladder:
1. **Exact** `indexOf`. If exactly one → exact span. If >1 → **ambiguous** (refuse; decision
2026-06-01: safer than silently editing the first).
2. **Per-line whitespace-insensitive** — compare `needle` lines to file line-windows ignoring per-line
`trimEnd`/leading-trailing blank lines.
3. **Unicode canonicalization** — normalize curly→straight quotes, en/em-dash→`-`, nbsp→space on both
sides, then retry the whitespace pass.
4. **Levenshtein** similarity ≥ 0.66 over line-windows sized to `needle`'s line count; best window wins.
Non-exact (fuzzy) matches return the actual file span so the caller replaces the real file text with
`new_string`. `pending_changes.ts` `applyOne`/`rewindOne` use `locateMatch`; `ambiguous`/`not_found`
return `success:false` with a clear message (no throw escaping the existing catch). Unit-tested
(`apps/coder/src/services/__tests__/fuzzy-match.test.ts`), per the `turn-guard.ts` pure-helper pattern.
## #4 — Worktree checkpoint + conversation-trim
**Problem:** `rewind` only reverses BooCode's own `pending_changes` (applied to the project root).
External agents (opencode/goose/qwen/claude) write **directly into the session worktree**
(`/tmp/booworktrees/sess-<id>`); rewind has zero coverage there.
**Schema** (`apps/coder/src/schema.sql`):
```sql
CREATE TABLE IF NOT EXISTS checkpoints (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
chat_id UUID NOT NULL REFERENCES chats(id) ON DELETE CASCADE,
session_id UUID,
worktree_id UUID REFERENCES worktrees(id) ON DELETE SET NULL,
message_id UUID, -- anchor: the assistant turn row this checkpoint precedes
commit_sha TEXT NOT NULL, -- shadow-commit capturing the pre-turn worktree tree
label TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT clock_timestamp()
);
CREATE INDEX IF NOT EXISTS checkpoints_chat_created_idx ON checkpoints(chat_id, created_at);
```
**Create** (`apps/coder/src/services/checkpoints.ts``createCheckpoint`): hooked into the three
external-agent dispatch paths in `dispatcher.ts` (`runWarmAcpTask` ~821, `runOpenCodeServerTask` ~513,
`runExternalAgent` ~255) — after `ensureSessionWorktree()` and the assistant-message insert (so the
anchor `message_id` exists), before the backend runs. Snapshot captures tracked **+ untracked** via a
temp-index shadow commit, stored in a private GC-safe ref:
```
cd <wt> && TMP=$(mktemp) && GIT_INDEX_FILE="$TMP" git read-tree HEAD \
&& GIT_INDEX_FILE="$TMP" git add -A \
&& TREE=$(GIT_INDEX_FILE="$TMP" git write-tree) \
&& SHA=$(git commit-tree "$TREE" -p HEAD -m "boocode checkpoint") \
&& git update-ref refs/boocode/checkpoints/<id> "$SHA" && rm -f "$TMP" && echo "$SHA"
```
Best-effort: a checkpoint failure logs and never breaks the turn. Native-boocode turns (project-root,
rewind-covered) get no checkpoint.
**Restore** (`POST /api/sessions/:sessionId/checkpoints/:checkpointId/restore`, proxied `/api/coder/*`):
1. Resolve + validate the checkpoint belongs to the session.
2. Reset worktree: `git -C <wt> reset --hard <commit_sha> && git -C <wt> clean -fd` (hostExec+shellEscape).
3. Trim transcript: `DELETE FROM messages WHERE chat_id = <cp.chat_id> AND created_at >=
(SELECT created_at FROM messages WHERE id = <cp.message_id>)` (+ explicit `message_parts` delete if
the FK isn't ON DELETE CASCADE — verify).
4. Reset backend (decision 2026-06-01): `UPDATE agent_sessions SET status='crashed' WHERE
chat_id=<cp.chat_id>` and evict the live pool session for `(chat,agent)` if present, so the next turn
re-establishes a fresh backend — transcript, files, and agent context all consistent at the restore
point. (Warm backends hold context server-side; no partial rewind exists.)
5. Delete now-orphaned later checkpoints: `DELETE FROM checkpoints WHERE chat_id=? AND created_at >
<cp.created_at>`.
6. Return `{ checkpoint_id, messages_deleted, worktree_reset, backend_reset }`.
**Frontend:** per-message "Restore to here" in `CoderMessageList.tsx` (via a new optional
`onRestoreCheckpoint?(chatId, messageId)` on `MessageActions` in `MessageBubble.tsx`), wired in
`CoderPane.tsx`; guarded to `status==='complete'` and to messages that have a checkpoint. After the call
returns, refetch the chat's messages (existing GET) — no new WS frame required.
## Decisions (2026-06-01)
- Multi-exact-match → **refuse as ambiguous** (#3).
- #4 **full** scope incl. conversation-trim.
- Restore **resets** the external-agent backend session (context re-established fresh).
## Parallelization
- **Unit 1 (#3)** — fully independent (`fuzzy-match.ts` + `pending_changes.ts` + test).
- **Unit 2 (#4 backend)** — schema + `checkpoints.ts` (create+restore) + 3 dispatcher hooks + restore route + backend reset. One agent owns all #4 coder backend (shared `checkpoints.ts`).
- **Unit 3 (#4 frontend)** — `CoderMessageList`/`MessageBubble`/`CoderPane`, against the pinned restore contract. Parallel with Unit 2. MUST NOT touch Sam's uncommitted WIP (`ChatTabBar`, `SessionLandingPage`, `Workspace`, `useWorkspacePanes`, `PaneHeaderActions`).
## Verify
- `pnpm -C apps/coder test` (incl. new `fuzzy-match` + any checkpoint pure-helper tests)
- `pnpm -C apps/server build` then `pnpm -C apps/coder build`
- `npx tsc -p apps/web/tsconfig.app.json --noEmit`
- Live smoke (manual, host): external-agent edit → checkpoint row; "Restore to here" → worktree reset + transcript trimmed + next turn fresh.

144
pnpm-lock.yaml generated
View File

@@ -51,6 +51,9 @@ importers:
'@agentclientprotocol/sdk': '@agentclientprotocol/sdk':
specifier: ^0.22.1 specifier: ^0.22.1
version: 0.22.1(zod@3.25.76) version: 0.22.1(zod@3.25.76)
'@anthropic-ai/claude-agent-sdk':
specifier: ^0.3.159
version: 0.3.159(@anthropic-ai/sdk@0.100.1(zod@3.25.76))(@modelcontextprotocol/sdk@1.29.0(zod@3.25.76))(zod@3.25.76)
'@boocode/server': '@boocode/server':
specifier: workspace:* specifier: workspace:*
version: link:../server version: link:../server
@@ -317,6 +320,63 @@ packages:
resolution: {integrity: sha512-UrcABB+4bUrFABwbluTIBErXwvbsU/V7TZWfmbgJfbkwiBuziS9gxdODUyuiecfdGQ85jglMW6juS3+z5TsKLw==} resolution: {integrity: sha512-UrcABB+4bUrFABwbluTIBErXwvbsU/V7TZWfmbgJfbkwiBuziS9gxdODUyuiecfdGQ85jglMW6juS3+z5TsKLw==}
engines: {node: '>=10'} engines: {node: '>=10'}
'@anthropic-ai/claude-agent-sdk-darwin-arm64@0.3.159':
resolution: {integrity: sha512-3nnH4yUNJVSyaU5DBlGw2yxc4zlnVvAnc9UOe+La47QVG7/dN+rWAgn4zCbqKk9bWFLDQ1Ek0r56EZE1Qo4UKQ==}
cpu: [arm64]
os: [darwin]
'@anthropic-ai/claude-agent-sdk-darwin-x64@0.3.159':
resolution: {integrity: sha512-iv+NRjz+t4Q1R2+kLdDbccSo3b0wedVJ9jMT3noznOVojZHzgkxVpTvt36/XkXV0rqIDQ5H18bBYIZZzOXu7mg==}
cpu: [x64]
os: [darwin]
'@anthropic-ai/claude-agent-sdk-linux-arm64-musl@0.3.159':
resolution: {integrity: sha512-WvwiQBWt3tdu5EwqjpDZszI6p2uetYsw4Cxc6ptO/SmLIYXcDienP8nmirZdsZrS+Gzk6imgY0IY5mmNaRhelQ==}
cpu: [arm64]
os: [linux]
'@anthropic-ai/claude-agent-sdk-linux-arm64@0.3.159':
resolution: {integrity: sha512-FlsS5M4GCpzsQVaNDFF8dRgFGR3QwyAHZFl/xM/2Y2BqVBH+NH17RpKQSJxr1qr41QnsNkinMnu2iSKoc33hKg==}
cpu: [arm64]
os: [linux]
'@anthropic-ai/claude-agent-sdk-linux-x64-musl@0.3.159':
resolution: {integrity: sha512-kFH6RC2YJbPc8XWRNy/wL4YU7LzdJjSwAdH488sVzIif3q+TrvVrV5y/IW0+MLmta+CKIqtFYpGaucsJYvj7Eg==}
cpu: [x64]
os: [linux]
'@anthropic-ai/claude-agent-sdk-linux-x64@0.3.159':
resolution: {integrity: sha512-uNPEC/iRzVb4bEdzs0KAz1zV7i1PVGEZZnJTQyi1OtgVa81sAoH/H0CbbzDiTsquKdaESf+1DSSEkUlfZmMUEw==}
cpu: [x64]
os: [linux]
'@anthropic-ai/claude-agent-sdk-win32-arm64@0.3.159':
resolution: {integrity: sha512-WN1QEZGgWXz9GMl61QU6j9E+LEF5plki87bL2xsGwuCPzK+OeVPQU55pabuP8P+vFBFHUo3Y9OlTVyZHnUzmAQ==}
cpu: [arm64]
os: [win32]
'@anthropic-ai/claude-agent-sdk-win32-x64@0.3.159':
resolution: {integrity: sha512-Ty4seccD+dTDX5hhj89IUELZd/LkxO5O43Uiz5Mo8ZJktoX38SK4XMZlBS935QdqTFLRvPL0hvK4Lt4dTOqzPw==}
cpu: [x64]
os: [win32]
'@anthropic-ai/claude-agent-sdk@0.3.159':
resolution: {integrity: sha512-Xh1oVMIK6N3KsiNIhqNH8ZK90zjRmAEL9d1Md8ZlGdHJE+HhdMYdBadujc3KEkV0uufsEUvYp+A3fDenfypGSA==}
engines: {node: '>=18.0.0'}
peerDependencies:
'@anthropic-ai/sdk': '>=0.93.0'
'@modelcontextprotocol/sdk': ^1.29.0
zod: ^4.0.0
'@anthropic-ai/sdk@0.100.1':
resolution: {integrity: sha512-RANcEe7LpiLczkKGOwoXOTuFdPhuubS0i4xaAKOMpcqc55YO0mukgxppV7eygx3DXNjxWT6RYOLPyOy0aIAmwg==}
hasBin: true
peerDependencies:
zod: ^3.25.0 || ^4.0.0
peerDependenciesMeta:
zod:
optional: true
'@babel/code-frame@7.29.0': '@babel/code-frame@7.29.0':
resolution: {integrity: sha512-9NhCeYjq9+3uxgdtp20LSiJXJvN0FeCtNGpJxuMFZ1Kv3cWUNb6DOhJwUvcVCzKGR66cw4njwM6hrJLqgOwbcw==} resolution: {integrity: sha512-9NhCeYjq9+3uxgdtp20LSiJXJvN0FeCtNGpJxuMFZ1Kv3cWUNb6DOhJwUvcVCzKGR66cw4njwM6hrJLqgOwbcw==}
engines: {node: '>=6.9.0'} engines: {node: '>=6.9.0'}
@@ -446,6 +506,10 @@ packages:
peerDependencies: peerDependencies:
'@babel/core': ^7.0.0-0 '@babel/core': ^7.0.0-0
'@babel/runtime@7.29.7':
resolution: {integrity: sha512-Nq8OhGWiZIZGV6hLHoyAKLLcJihP/xFeBMGJoUrxTX2psI8dCifzLhZISFb+VWS3wFMRDmCGw5R+dOySCqPLhw==}
engines: {node: '>=6.9.0'}
'@babel/template@7.28.6': '@babel/template@7.28.6':
resolution: {integrity: sha512-YA6Ma2KsCdGb+WC6UpBVFJGXL58MDA6oyONbjyF/+5sBgxY/dwkhLogbMT2GXXyU84/IhRw/2D1Os1B/giz+BQ==} resolution: {integrity: sha512-YA6Ma2KsCdGb+WC6UpBVFJGXL58MDA6oyONbjyF/+5sBgxY/dwkhLogbMT2GXXyU84/IhRw/2D1Os1B/giz+BQ==}
engines: {node: '>=6.9.0'} engines: {node: '>=6.9.0'}
@@ -1787,6 +1851,9 @@ packages:
resolution: {integrity: sha512-tlqY9xq5ukxTUZBmoOp+m61cqwQD5pHJtFY3Mn8CA8ps6yghLH/Hw8UPdqg4OLmFW3IFlcXnQNmo/dh8HzXYIQ==} resolution: {integrity: sha512-tlqY9xq5ukxTUZBmoOp+m61cqwQD5pHJtFY3Mn8CA8ps6yghLH/Hw8UPdqg4OLmFW3IFlcXnQNmo/dh8HzXYIQ==}
engines: {node: '>=18'} engines: {node: '>=18'}
'@stablelib/base64@1.0.1':
resolution: {integrity: sha512-1bnPQqSxSuc3Ii6MhBysoWCg58j97aUjuCSZrGSmDxNqtytIi0k8utUenAwTZN4V5mXXYGsVUI9zeBqy+jBOSQ==}
'@standard-schema/spec@1.1.0': '@standard-schema/spec@1.1.0':
resolution: {integrity: sha512-l2aFy5jALhniG5HgqrD6jXLi/rUWrKvqN/qJx6yoJsgKhblVd+iqqU4RCXavm/jPityDo5TCvKMnpjKnOriy0w==} resolution: {integrity: sha512-l2aFy5jALhniG5HgqrD6jXLi/rUWrKvqN/qJx6yoJsgKhblVd+iqqU4RCXavm/jPityDo5TCvKMnpjKnOriy0w==}
@@ -2528,6 +2595,9 @@ packages:
fast-querystring@1.1.2: fast-querystring@1.1.2:
resolution: {integrity: sha512-g6KuKWmFXc0fID8WWH0jit4g0AGBoJhCkJMb1RmbsSEUNvQ+ZC8D6CUZ+GtF8nMzSPXnhiePyyqqipzNNEnHjg==} resolution: {integrity: sha512-g6KuKWmFXc0fID8WWH0jit4g0AGBoJhCkJMb1RmbsSEUNvQ+ZC8D6CUZ+GtF8nMzSPXnhiePyyqqipzNNEnHjg==}
fast-sha256@1.3.0:
resolution: {integrity: sha512-n11RGP/lrWEFI/bWdygLxhI+pVeo1ZYIVwvvPkW7azl/rOy+F3HYRZ2K5zeE9mmkhQppyv9sQFx0JM9UabnpPQ==}
fast-string-truncated-width@3.0.3: fast-string-truncated-width@3.0.3:
resolution: {integrity: sha512-0jjjIEL6+0jag3l2XWWizO64/aZVtpiGE3t0Zgqxv0DPuxiMjvB3M24fCyhZUO4KomJQPj3LTSUnDP3GpdwC0g==} resolution: {integrity: sha512-0jjjIEL6+0jag3l2XWWizO64/aZVtpiGE3t0Zgqxv0DPuxiMjvB3M24fCyhZUO4KomJQPj3LTSUnDP3GpdwC0g==}
@@ -2873,6 +2943,10 @@ packages:
json-schema-ref-resolver@1.0.1: json-schema-ref-resolver@1.0.1:
resolution: {integrity: sha512-EJAj1pgHc1hxF6vo2Z3s69fMjO1INq6eGHXZ8Z6wCQeldCuwxGK9Sxf4/cScGn3FZubCVUehfWtcDM/PLteCQw==} resolution: {integrity: sha512-EJAj1pgHc1hxF6vo2Z3s69fMjO1INq6eGHXZ8Z6wCQeldCuwxGK9Sxf4/cScGn3FZubCVUehfWtcDM/PLteCQw==}
json-schema-to-ts@3.1.1:
resolution: {integrity: sha512-+DWg8jCJG2TEnpy7kOm/7/AxaYoaRbjVB4LFZLySZlWn8exGs3A4OLJR966cVvU26N7X9TWxl+Jsw7dzAqKT6g==}
engines: {node: '>=16'}
json-schema-traverse@1.0.0: json-schema-traverse@1.0.0:
resolution: {integrity: sha512-NM8/P9n3XjXhIZn1lLhkFaACTOURQXjWhV4BA/RnOv8xvgqtqpAX9IO4mRQxSx1Rlo4tqzeqb0sOlruaOy3dug==} resolution: {integrity: sha512-NM8/P9n3XjXhIZn1lLhkFaACTOURQXjWhV4BA/RnOv8xvgqtqpAX9IO4mRQxSx1Rlo4tqzeqb0sOlruaOy3dug==}
@@ -3756,6 +3830,9 @@ packages:
stackback@0.0.2: stackback@0.0.2:
resolution: {integrity: sha512-1XMJE5fQo1jGH6Y/7ebnwPOBEkIEnT4QF32d5R1+VXdXveM0IBMJt8zfaxX1P3QhVwrYe+576+jkANtSS2mBbw==} resolution: {integrity: sha512-1XMJE5fQo1jGH6Y/7ebnwPOBEkIEnT4QF32d5R1+VXdXveM0IBMJt8zfaxX1P3QhVwrYe+576+jkANtSS2mBbw==}
standardwebhooks@1.0.0:
resolution: {integrity: sha512-BbHGOQK9olHPMvQNHWul6MYlrRTAOKn03rOe4A8O3CLWhNf4YHBqq2HJKKC+sfqpxiBY52pNeesD6jIiLDz8jg==}
statuses@2.0.1: statuses@2.0.1:
resolution: {integrity: sha512-RwNA9Z/7PrK06rYLIzFMlaF+l73iwpzsqRIFgbMLbTcLD6cOao82TaWefPXQvB2fOC4AjuYSEndS7N/mTCbkdQ==} resolution: {integrity: sha512-RwNA9Z/7PrK06rYLIzFMlaF+l73iwpzsqRIFgbMLbTcLD6cOao82TaWefPXQvB2fOC4AjuYSEndS7N/mTCbkdQ==}
engines: {node: '>= 0.8'} engines: {node: '>= 0.8'}
@@ -3899,6 +3976,9 @@ packages:
trough@2.2.0: trough@2.2.0:
resolution: {integrity: sha512-tmMpK00BjZiUyVyvrBK7knerNgmgvcV/KLVyuma/SC+TQN167GrMRciANTz09+k3zW8L8t60jWO1GpfkZdjTaw==} resolution: {integrity: sha512-tmMpK00BjZiUyVyvrBK7knerNgmgvcV/KLVyuma/SC+TQN167GrMRciANTz09+k3zW8L8t60jWO1GpfkZdjTaw==}
ts-algebra@2.0.0:
resolution: {integrity: sha512-FPAhNPFMrkwz76P7cdjdmiShwMynZYN6SgOujD1urY4oNm80Ou9oMdmbR45LotcKOXoy7wSmHkRFE6Mxbrhefw==}
ts-morph@26.0.0: ts-morph@26.0.0:
resolution: {integrity: sha512-ztMO++owQnz8c/gIENcM9XfCEzgoGphTv+nKpYNM1bgsdOVC/jRZuEBf6N+mLLDNg68Kl+GgUZfOySaRiG1/Ug==} resolution: {integrity: sha512-ztMO++owQnz8c/gIENcM9XfCEzgoGphTv+nKpYNM1bgsdOVC/jRZuEBf6N+mLLDNg68Kl+GgUZfOySaRiG1/Ug==}
@@ -4194,6 +4274,52 @@ snapshots:
'@alloc/quick-lru@5.2.0': {} '@alloc/quick-lru@5.2.0': {}
'@anthropic-ai/claude-agent-sdk-darwin-arm64@0.3.159':
optional: true
'@anthropic-ai/claude-agent-sdk-darwin-x64@0.3.159':
optional: true
'@anthropic-ai/claude-agent-sdk-linux-arm64-musl@0.3.159':
optional: true
'@anthropic-ai/claude-agent-sdk-linux-arm64@0.3.159':
optional: true
'@anthropic-ai/claude-agent-sdk-linux-x64-musl@0.3.159':
optional: true
'@anthropic-ai/claude-agent-sdk-linux-x64@0.3.159':
optional: true
'@anthropic-ai/claude-agent-sdk-win32-arm64@0.3.159':
optional: true
'@anthropic-ai/claude-agent-sdk-win32-x64@0.3.159':
optional: true
'@anthropic-ai/claude-agent-sdk@0.3.159(@anthropic-ai/sdk@0.100.1(zod@3.25.76))(@modelcontextprotocol/sdk@1.29.0(zod@3.25.76))(zod@3.25.76)':
dependencies:
'@anthropic-ai/sdk': 0.100.1(zod@3.25.76)
'@modelcontextprotocol/sdk': 1.29.0(zod@3.25.76)
zod: 3.25.76
optionalDependencies:
'@anthropic-ai/claude-agent-sdk-darwin-arm64': 0.3.159
'@anthropic-ai/claude-agent-sdk-darwin-x64': 0.3.159
'@anthropic-ai/claude-agent-sdk-linux-arm64': 0.3.159
'@anthropic-ai/claude-agent-sdk-linux-arm64-musl': 0.3.159
'@anthropic-ai/claude-agent-sdk-linux-x64': 0.3.159
'@anthropic-ai/claude-agent-sdk-linux-x64-musl': 0.3.159
'@anthropic-ai/claude-agent-sdk-win32-arm64': 0.3.159
'@anthropic-ai/claude-agent-sdk-win32-x64': 0.3.159
'@anthropic-ai/sdk@0.100.1(zod@3.25.76)':
dependencies:
json-schema-to-ts: 3.1.1
standardwebhooks: 1.0.0
optionalDependencies:
zod: 3.25.76
'@babel/code-frame@7.29.0': '@babel/code-frame@7.29.0':
dependencies: dependencies:
'@babel/helper-validator-identifier': 7.28.5 '@babel/helper-validator-identifier': 7.28.5
@@ -4367,6 +4493,8 @@ snapshots:
transitivePeerDependencies: transitivePeerDependencies:
- supports-color - supports-color
'@babel/runtime@7.29.7': {}
'@babel/template@7.28.6': '@babel/template@7.28.6':
dependencies: dependencies:
'@babel/code-frame': 7.29.0 '@babel/code-frame': 7.29.0
@@ -5618,6 +5746,8 @@ snapshots:
'@sindresorhus/merge-streams@4.0.0': {} '@sindresorhus/merge-streams@4.0.0': {}
'@stablelib/base64@1.0.1': {}
'@standard-schema/spec@1.1.0': {} '@standard-schema/spec@1.1.0': {}
'@tailwindcss/node@4.3.0': '@tailwindcss/node@4.3.0':
@@ -6386,6 +6516,8 @@ snapshots:
dependencies: dependencies:
fast-decode-uri-component: 1.0.1 fast-decode-uri-component: 1.0.1
fast-sha256@1.3.0: {}
fast-string-truncated-width@3.0.3: {} fast-string-truncated-width@3.0.3: {}
fast-string-width@3.0.2: fast-string-width@3.0.2:
@@ -6727,6 +6859,11 @@ snapshots:
dependencies: dependencies:
fast-deep-equal: 3.1.3 fast-deep-equal: 3.1.3
json-schema-to-ts@3.1.1:
dependencies:
'@babel/runtime': 7.29.7
ts-algebra: 2.0.0
json-schema-traverse@1.0.0: {} json-schema-traverse@1.0.0: {}
json-schema-typed@8.0.2: {} json-schema-typed@8.0.2: {}
@@ -7939,6 +8076,11 @@ snapshots:
stackback@0.0.2: {} stackback@0.0.2: {}
standardwebhooks@1.0.0:
dependencies:
'@stablelib/base64': 1.0.1
fast-sha256: 1.3.0
statuses@2.0.1: {} statuses@2.0.1: {}
statuses@2.0.2: {} statuses@2.0.2: {}
@@ -8061,6 +8203,8 @@ snapshots:
trough@2.2.0: {} trough@2.2.0: {}
ts-algebra@2.0.0: {}
ts-morph@26.0.0: ts-morph@26.0.0:
dependencies: dependencies:
'@ts-morph/common': 0.27.0 '@ts-morph/common': 0.27.0