Compare commits

..

4 Commits

Author SHA1 Message Date
bcc89d8adc feat: MistakeTracker + file-provenance ledger (v2.7.4)
Two native-inference hardening features from boocode_code_review_v2 §1 #12.

MistakeTracker: new pure mistake-tracker.ts tracks consecutive heterogeneous
tool failures (kinds surfaced per tool from tool-phase.ts). On 3 in a row the
turn loop soft-nudges (model-facing recovery guidance + mistake_recovery
sentinel + reset), then escalates to stopping the turn (cap-hit-style, Continue
affordance) on a re-trip. Complements doom-loop (identical repeats) + cap-hit.

File-provenance ledger: compaction.ts derives a deterministic ## Files Read list
from the head messages' read-tool calls and injects it into the rolling-summary
prompt so provenance survives compaction (no new table; read-only).

mistake_recovery sentinel: MessageMetadata arm (server + web) + MessageBubble
render branch. Built by 2 parallel agents. Server 545 tests passing (23 new);
build + web tsc clean. Native-inference only. Builds on v2.7.3.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-01 13:05:03 +00:00
f53d6a8afd Merge sampling-knobs-streamjson: v2.7.3 sampling knobs + live PTY stream-json + token UI 2026-06-01 12:47:31 +00:00
a584dd16b0 feat: sampling knobs + live PTY stream-json + token UI (v2.7.3)
Three small wins from boocode_code_review_v2 §1 #11/#7/#8.

#11 sampling knobs: top_n_sigma + dry_* family as first-class Agent fields,
threaded into the request body via providerOptions.openaiCompatible. Fixes a
latent bug — top_k (rejected by the AI-SDK provider) and min_p (never passed to
streamText) were dead on the wire; both now route through the same channel.
--reasoning-budget documented in data/AGENTS.md.

#7 live PTY stream-json: new stream-json-parser.ts line-buffers qwen/claude
NDJSON and emits text/reasoning/tool frames live + persists, with a fallback to
the old opaque slice. claude gets --output-format stream-json --verbose.

#8 token UI: agent_sessions input/output_tokens/cost now flow through the route
+ type and render beside the AgentComposerBar session chip.

Built by 3 parallel agents. Server 523 + coder 245 tests passing; builds + web
tsc clean. Builds on v2.7.2. openspec sampling-streamjson-tokens.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-01 12:47:17 +00:00
5651f56039 Merge checkpoint-idor-fix: v2.7.2 close 2 checkpoint IDOR holes 2026-06-01 12:16:08 +00:00
27 changed files with 1761 additions and 42 deletions

View File

@@ -2,6 +2,14 @@
All notable changes per release tag. Most recent on top, ordered by tag creation date (which matches the git history). Tag names follow `vMAJOR.MINOR.PATCH-slug` — the slug describes what shipped, so the tag name alone is enough to recall the batch. All notable changes per release tag. Most recent on top, ordered by tag creation date (which matches the git history). Tag names follow `vMAJOR.MINOR.PATCH-slug` — the slug describes what shipped, so the tag name alone is enough to recall the batch.
## v2.7.4-mistake-tracker-ledger — 2026-06-01
Two native-inference hardening features from `boocode_code_review_v2.md` §1 #12 (cline, algorithm-reimplemented). **MistakeTracker:** complements the doom-loop guard (identical repeats) and cap-hit (budget) by catching a run of consecutive tool *failures*. A new pure `mistake-tracker.ts` tracks heterogeneous failure kinds (`zod_reject`/`tool_not_found`/`exec_error`/`api_error`/`permission_denied`, surfaced per tool from `tool-phase.ts`); after 3 consecutive failures the `turn.ts` loop does a **soft nudge** — injects model-facing recovery guidance into the next step + drops a `mistake_recovery` UI sentinel + resets — then **escalates** to stopping the turn (cap-hit-style, with a Continue affordance) if it re-trips without an intervening success, so heterogeneous failures can't burn the whole step budget. **File-provenance ledger:** `compaction.ts` now derives a deterministic, sorted `## Files Read` list from the head messages' read-tool calls (`view_file`/`grep`/`find_files`/`list_dir`) and injects it into the rolling-summary prompt so file provenance survives compaction (no new table; prompt-driven merge, read-only since BooChat has no write tools). The `mistake_recovery` sentinel adds an arm to `MessageMetadata` in both server + web type copies plus a `MessageBubble` render branch. Built by two parallel agents (backend + frontend sentinel) over disjoint apps; server 545 tests passing (23 new: 12 mistake-tracker + 11 compaction), build + web tsc clean. Native-inference only (external agents run their own loops). Builds on `v2.7.3-sampling-streamjson-tokens`; openspec `mistake-tracker-file-ledger`.
## v2.7.3-sampling-streamjson-tokens — 2026-06-01
Three small BooCode wins from `boocode_code_review_v2.md` §1 #11/#7/#8. **Sampling knobs:** per-agent `top_n_sigma` + the `dry_*` repetition family (`dry_multiplier`/`dry_base`/`dry_allowed_length`/`dry_penalty_last_n`) are now first-class Agent frontmatter fields, parsed in `agents.ts` and threaded into the llama-swap chat-completion body via `providerOptions.openaiCompatible` (the `@ai-sdk/openai-compatible` extra-body channel). This surfaced and fixed a **latent bug**: `top_k` (rejected by the AI-SDK provider as unsupported) and `min_p` (never passed to `streamText` at all) had been dead on the wire — no agent's `top_k`/`min_p` ever affected sampling; both now route through the same channel, so agents that set them will start using them. `--reasoning-budget` is documented in `data/AGENTS.md` (already works via `llama_extra_args`, permitted by the deny-list validator). **Live PTY stream-json:** qwen/claude PTY dispatch sliced stdout opaque; a new `stream-json-parser.ts` line-buffers the Claude-Code-compatible NDJSON and emits text/reasoning/tool frames live as they arrive (mirroring the ACP/opencode paths) + persists the structured parts, with a clean fallback to the old opaque slice when output isn't NDJSON (claude now runs `--output-format stream-json --verbose`). **Token UI:** the per-`(chat,agent)` `agent_sessions.input_tokens`/`output_tokens`/`cost` columns (accumulated since `v2.6.8` but dropped by the read route + wire type) now flow through and render condensed beside the AgentComposerBar session chip. Built by three parallel agents over disjoint subsystems; server 523 + coder 245 tests passing (incl. 11 new stream-json-parser + new agent-parse tests), all builds + web tsc clean. Builds on `v2.7.2-checkpoint-idor`; openspec `sampling-streamjson-tokens`. The qwen-vs-claude `usage` field names in #7 are best-guess pending a live smoke.
## v2.7.2-checkpoint-idor — 2026-06-01 ## v2.7.2-checkpoint-idor — 2026-06-01
Closes two IDOR authorization holes in the `v2.7.1-write-edit-robustness` checkpoint routes, flagged by the automated push security review. The `GET /api/sessions/:id/checkpoints?chat_id=` list route scoped its `chat_id` branch by `chat_id` alone — any session's `chat_id` would read its checkpoints; it now joins through `chats` and gates on `chats.session_id` (authoritative; `checkpoints.session_id` is a nullable denormalized hint). The `restoreCheckpoint` scope guard was fail-open — `cp.session_id && cp.session_id !== sessionId` fell through whenever the checkpoint's denormalized `session_id` was null, allowing a cross-session restore (worktree reset + transcript trim) — it now resolves the owning session via the checkpoint's chat and denies on any missing-or-mismatched row. A DB-integration regression covers the exact null-`session_id` cross-session case. Real-world blast radius is small (BooCoder is single-user behind Authelia on loopback), but both are genuine authorization bugs. Coder suite 234 passing (7/7 checkpoint tests incl. the regression against live postgres+git), typecheck clean. Hotfix on `v2.7.1-write-edit-robustness`. Closes two IDOR authorization holes in the `v2.7.1-write-edit-robustness` checkpoint routes, flagged by the automated push security review. The `GET /api/sessions/:id/checkpoints?chat_id=` list route scoped its `chat_id` branch by `chat_id` alone — any session's `chat_id` would read its checkpoints; it now joins through `chats` and gates on `chats.session_id` (authoritative; `checkpoints.session_id` is a nullable denormalized hint). The `restoreCheckpoint` scope guard was fail-open — `cp.session_id && cp.session_id !== sessionId` fell through whenever the checkpoint's denormalized `session_id` was null, allowing a cross-session restore (worktree reset + transcript trim) — it now resolves the owning session via the checkpoint's chat and denies on any missing-or-mismatched row. A DB-integration regression covers the exact null-`session_id` cross-session case. Real-world blast radius is small (BooCoder is single-user behind Authelia on loopback), but both are genuine authorization bugs. Coder suite 234 passing (7/7 checkpoint tests incl. the regression against live postgres+git), typecheck clean. Hotfix on `v2.7.1-write-edit-robustness`.

View File

@@ -16,6 +16,11 @@ export interface AgentSessionRow {
status: string; status: string;
has_session: boolean; has_session: boolean;
last_active_at: string | null; last_active_at: string | null;
// v2.6.8 per-(chat,agent) running token/cost totals (sampling-streamjson-tokens
// #8). BIGINT columns arrive as strings over the wire; the frontend coerces.
input_tokens: number;
output_tokens: number;
cost: number;
} }
export function registerAgentSessionRoutes(app: FastifyInstance, sql: Sql): void { export function registerAgentSessionRoutes(app: FastifyInstance, sql: Sql): void {
@@ -39,7 +44,10 @@ export function registerAgentSessionRoutes(app: FastifyInstance, sql: Sql): void
a.agent AS agent, a.agent AS agent,
a.status AS status, a.status AS status,
(a.agent_session_id IS NOT NULL) AS has_session, (a.agent_session_id IS NOT NULL) AS has_session,
a.last_active_at AS last_active_at a.last_active_at AS last_active_at,
a.input_tokens AS input_tokens,
a.output_tokens AS output_tokens,
a.cost AS cost
FROM agent_sessions a FROM agent_sessions a
JOIN chats c ON c.id = a.chat_id JOIN chats c ON c.id = a.chat_id
WHERE c.session_id = ${sessionId} WHERE c.session_id = ${sessionId}

View File

@@ -0,0 +1,189 @@
import { describe, it, expect } from 'vitest';
import {
makeStreamJsonParser,
makeStreamJsonState,
parseStreamJsonLine,
type AgentEventList,
} from '../stream-json-parser.js';
import type { AgentEvent } from '../agent-backend.js';
import type { AcpToolSnapshot } from '../acp-tool-snapshot.js';
// Helpers to JSON-encode the representative Claude-Code stream-json lines.
const sys = (sessionId: string) =>
JSON.stringify({ type: 'system', subtype: 'init', session_id: sessionId, tools: ['read', 'edit'] });
const streamEvent = (event: unknown) => JSON.stringify({ type: 'stream_event', event });
const textDelta = (index: number, text: string) =>
streamEvent({ type: 'content_block_delta', index, delta: { type: 'text_delta', text } });
const thinkingDelta = (index: number, thinking: string) =>
streamEvent({ type: 'content_block_delta', index, delta: { type: 'thinking_delta', thinking } });
const toolStart = (index: number, id: string, name: string) =>
streamEvent({ type: 'content_block_start', index, content_block: { type: 'tool_use', id, name } });
const inputJsonDelta = (index: number, partial: string) =>
streamEvent({ type: 'content_block_delta', index, delta: { type: 'input_json_delta', partial_json: partial } });
const blockStop = (index: number) => streamEvent({ type: 'content_block_stop', index });
const resultLine = (input: number, output: number, sessionId?: string) =>
JSON.stringify({ type: 'result', subtype: 'success', session_id: sessionId, usage: { input_tokens: input, output_tokens: output } });
describe('parseStreamJsonLine (pure per-line mapping)', () => {
it('captures session_id from the system init line and emits no events', () => {
const state = makeStreamJsonState();
const events = parseStreamJsonLine(sys('sess-abc'), state);
expect(events).toEqual([]);
expect(state.sessionId).toBe('sess-abc');
});
it('maps a text_delta stream_event → a text event', () => {
const state = makeStreamJsonState();
expect(parseStreamJsonLine(textDelta(0, 'Hello'), state)).toEqual([{ type: 'text', text: 'Hello' }]);
});
it('maps a thinking_delta stream_event → a reasoning event', () => {
const state = makeStreamJsonState();
expect(parseStreamJsonLine(thinkingDelta(0, 'pondering'), state)).toEqual([
{ type: 'reasoning', text: 'pondering' },
]);
});
it('tolerates a garbage / non-JSON line (returns [], no throw)', () => {
const state = makeStreamJsonState();
expect(parseStreamJsonLine('not json at all {{{', state)).toEqual([]);
expect(parseStreamJsonLine('', state)).toEqual([]);
expect(parseStreamJsonLine(' ', state)).toEqual([]);
// A truncated/partial JSON object also yields [] rather than throwing.
expect(parseStreamJsonLine('{"type":"stream_event","eve', state)).toEqual([]);
});
it('ignores unknown top-level line types and the user (tool-result) line', () => {
const state = makeStreamJsonState();
expect(parseStreamJsonLine(JSON.stringify({ type: 'user', message: {} }), state)).toEqual([]);
expect(parseStreamJsonLine(JSON.stringify({ type: 'whatever' }), state)).toEqual([]);
});
it('assembles a tool call across input_json_delta chunks (split across lines)', () => {
const state = makeStreamJsonState();
// start → tool_call (running, empty args)
const start = parseStreamJsonLine(toolStart(1, 'toolu_1', 'edit_file'), state);
expect(start).toHaveLength(1);
expect(start[0]!.type).toBe('tool_call');
const startSnap = (start[0] as { type: 'tool_call'; toolCall: AcpToolSnapshot }).toolCall;
expect(startSnap.toolCallId).toBe('toolu_1');
expect(startSnap.title).toBe('edit_file');
expect(startSnap.status).toBe('in_progress');
expect(startSnap.rawInput).toEqual({});
// args streamed in fragments — no events until stop
expect(parseStreamJsonLine(inputJsonDelta(1, '{"path":"a'), state)).toEqual([]);
expect(parseStreamJsonLine(inputJsonDelta(1, '.ts","content":'), state)).toEqual([]);
expect(parseStreamJsonLine(inputJsonDelta(1, '"hi"}'), state)).toEqual([]);
// stop → tool_update with the parsed, fully-assembled input
const stop = parseStreamJsonLine(blockStop(1), state);
expect(stop).toHaveLength(1);
expect(stop[0]!.type).toBe('tool_update');
const stopSnap = (stop[0] as { type: 'tool_update'; toolCall: AcpToolSnapshot }).toolCall;
expect(stopSnap.toolCallId).toBe('toolu_1');
expect(stopSnap.status).toBe('completed');
expect(stopSnap.rawInput).toEqual({ path: 'a.ts', content: 'hi' });
});
it('falls back to {_raw} when accumulated tool args are not valid JSON', () => {
const state = makeStreamJsonState();
parseStreamJsonLine(toolStart(0, 'toolu_x', 'run'), state);
parseStreamJsonLine(inputJsonDelta(0, '{"broken'), state);
const stop = parseStreamJsonLine(blockStop(0), state);
const snap = (stop[0] as { type: 'tool_update'; toolCall: AcpToolSnapshot }).toolCall;
expect(snap.rawInput).toEqual({ _raw: '{"broken' });
});
it('captures usage from message_delta and result lines', () => {
const state = makeStreamJsonState();
parseStreamJsonLine(streamEvent({ type: 'message_delta', usage: { output_tokens: 42 } }), state);
expect(state.usage.outputTokens).toBe(42);
parseStreamJsonLine(resultLine(100, 250, 'sess-z'), state);
expect(state.usage.inputTokens).toBe(100);
expect(state.usage.outputTokens).toBe(250);
expect(state.sessionId).toBe('sess-z');
});
it('maps a terminal assistant message (fallback) → text + reasoning + tool events', () => {
const state = makeStreamJsonState();
const line = JSON.stringify({
type: 'assistant',
session_id: 'sess-asst',
message: {
content: [
{ type: 'thinking', thinking: 'let me think' },
{ type: 'text', text: 'Here is the answer' },
{ type: 'tool_use', id: 'toolu_9', name: 'view_file', input: { path: 'x.ts' } },
],
usage: { input_tokens: 5, output_tokens: 7 },
},
});
const events = parseStreamJsonLine(line, state);
expect(events).toEqual([
{ type: 'reasoning', text: 'let me think' },
{ type: 'text', text: 'Here is the answer' },
{
type: 'tool_update',
toolCall: { toolCallId: 'toolu_9', title: 'view_file', kind: null, status: 'completed', rawInput: { path: 'x.ts' } },
},
]);
expect(state.usage).toEqual({ inputTokens: 5, outputTokens: 7 });
expect(state.sessionId).toBe('sess-asst');
});
});
describe('makeStreamJsonParser (stateful wrapper over a full turn)', () => {
it('streams a representative turn: init → text → thinking → tool → result', () => {
const parser = makeStreamJsonParser();
const all: AgentEvent[] = [];
const feed = (line: string): AgentEventList => {
const evs = parser.push(line);
all.push(...evs);
return evs;
};
feed(sys('sess-1'));
feed(textDelta(0, 'Reading '));
feed(textDelta(0, 'the file. '));
feed(thinkingDelta(0, 'I should edit it'));
feed(toolStart(1, 'toolu_a', 'edit_file'));
feed(inputJsonDelta(1, '{"path":'));
feed(inputJsonDelta(1, '"main.ts"}'));
feed(blockStop(1));
feed(textDelta(0, 'Done.'));
feed(resultLine(120, 80, 'sess-1'));
expect(all).toEqual([
{ type: 'text', text: 'Reading ' },
{ type: 'text', text: 'the file. ' },
{ type: 'reasoning', text: 'I should edit it' },
{
type: 'tool_call',
toolCall: { toolCallId: 'toolu_a', title: 'edit_file', kind: null, status: 'in_progress', rawInput: {} },
},
{
type: 'tool_update',
toolCall: { toolCallId: 'toolu_a', title: 'edit_file', kind: null, status: 'completed', rawInput: { path: 'main.ts' } },
},
{ type: 'text', text: 'Done.' },
]);
expect(parser.usage()).toEqual({ inputTokens: 120, outputTokens: 80 });
expect(parser.sessionId()).toBe('sess-1');
});
it('a garbage line interleaved mid-turn does not derail subsequent parsing', () => {
const parser = makeStreamJsonParser();
expect(parser.push(textDelta(0, 'a'))).toEqual([{ type: 'text', text: 'a' }]);
expect(parser.push('>>> not json <<<')).toEqual([]);
expect(parser.push(textDelta(0, 'b'))).toEqual([{ type: 'text', text: 'b' }]);
});
});

View File

@@ -410,6 +410,52 @@ export function createDispatcher(deps: Deps): { start(): void; stop(): Promise<v
outputSummary = result.output.slice(0, 500); outputSummary = result.output.slice(0, 500);
await persistExternalAgentTurn(sql, assistantId, result.toolSnapshots, acpReasoning); await persistExternalAgentTurn(sql, assistantId, result.toolSnapshots, acpReasoning);
} else { } else {
// v#7 (stream-json): claude + qwen run with --output-format stream-json.
// Parse the NDJSON live in pty-dispatch and forward AgentEvents here so we
// publish the SAME live frames the warm-ACP / opencode paths emit (text,
// reasoning, tool) and persist structured parts. Accumulate for the final
// message content + persistence; fall back to the opaque stdout slice when
// nothing parsed (agent ran without the flag, or crashed before emitting).
const ptyTextChunks: string[] = [];
const ptyReasoningChunks: string[] = [];
const ptyToolSnaps = new Map<string, AcpToolSnapshot>();
const onPtyEvent = (e: AgentEvent): void => {
switch (e.type) {
case 'text':
ptyTextChunks.push(e.text);
broker.publishFrame(sessionId, {
type: 'delta',
message_id: assistantId,
chat_id: chatId,
content: e.text,
} as WsFrame);
break;
case 'reasoning':
ptyReasoningChunks.push(e.text);
broker.publishFrame(sessionId, {
type: 'reasoning_delta',
message_id: assistantId,
chat_id: chatId,
content: e.text,
} as WsFrame);
break;
case 'tool_call':
case 'tool_update':
ptyToolSnaps.set(e.toolCall.toolCallId, e.toolCall);
broker.publishFrame(sessionId, {
type: 'tool_call',
message_id: assistantId,
chat_id: chatId,
tool_call: snapshotToWireToolCall(e.toolCall),
} as WsFrame);
break;
case 'commands':
// stream-json carries no commands today; ignore if it ever does.
break;
}
};
const result = await dispatchViaPty({ const result = await dispatchViaPty({
agent, agent,
task: task.input, task: task.input,
@@ -420,17 +466,33 @@ export function createDispatcher(deps: Deps): { start(): void; stop(): Promise<v
thinkingOptionId: task.thinking_option_id ?? undefined, thinkingOptionId: task.thinking_option_id ?? undefined,
signal: ac.signal, signal: ac.signal,
log, log,
onEvent: onPtyEvent,
}); });
assistantContent = (result.stdout || result.stderr || '(no output)').slice(0, 50_000);
outputSummary = (result.stdout || result.stderr).slice(0, 500);
if (assistantContent) { if (result.streamed) {
broker.publishFrame(sessionId, { assistantContent = ptyTextChunks.join('').slice(0, 50_000);
type: 'delta', // stream-json text can be empty for a tool-only turn — surface stderr or a
message_id: assistantId, // placeholder so the message row isn't blank.
chat_id: chatId, if (!assistantContent) {
content: assistantContent, assistantContent = (result.stderr || '(no text output)').slice(0, 50_000);
} as WsFrame); }
outputSummary = (ptyTextChunks.join('') || result.stderr).slice(0, 500);
acpReasoning = ptyReasoningChunks.join('').slice(0, 200_000);
await persistExternalAgentTurn(sql, assistantId, [...ptyToolSnaps.values()], acpReasoning);
} else {
// Fallback: agent produced no parseable NDJSON (ran without the flag, or
// crashed). Preserve today's opaque stdout-slice + single delta behavior.
assistantContent = (result.stdout || result.stderr || '(no output)').slice(0, 50_000);
outputSummary = (result.stdout || result.stderr).slice(0, 500);
if (assistantContent) {
broker.publishFrame(sessionId, {
type: 'delta',
message_id: assistantId,
chat_id: chatId,
content: assistantContent,
} as WsFrame);
}
} }
} }

View File

@@ -1,13 +1,29 @@
/** /**
* PTY dispatch — runs external agents directly on the host. * PTY dispatch — runs external agents directly on the host.
*
* claude + qwen run with `--output-format stream-json` and emit Claude-Code's
* stream-json NDJSON on stdout. When an `onEvent` callback is supplied we
* line-buffer that stdout (split on `\n`, hold the partial tail) and feed complete
* lines to `makeStreamJsonParser` so deltas surface live as AgentEvents. The raw
* stdout is still accumulated + returned for back-compat (and the dispatcher's
* fallback when nothing parsed). See `stream-json-parser.ts`.
*/ */
import type { FastifyBaseLogger } from 'fastify'; import type { FastifyBaseLogger } from 'fastify';
import { spawn } from 'node:child_process'; import { spawn } from 'node:child_process';
import type { AgentEvent } from './agent-backend.js';
import { makeStreamJsonParser, type StreamJsonUsage } from './stream-json-parser.js';
export interface DispatchResult { export interface DispatchResult {
exitCode: number; exitCode: number;
stdout: string; stdout: string;
stderr: string; stderr: string;
/** True iff at least one NDJSON AgentEvent was parsed from stdout (v#7). When
* false the dispatcher falls back to slicing stdout as the assistant content. */
streamed: boolean;
/** Final usage parsed from the stream-json `result` / `message_delta`, if any. */
usage?: StreamJsonUsage;
/** Provider session id from the stream-json `system` init line, if any. */
agentSessionId?: string | null;
} }
export interface PtyDispatchOpts { export interface PtyDispatchOpts {
@@ -20,6 +36,10 @@ export interface PtyDispatchOpts {
installPath?: string; installPath?: string;
signal?: AbortSignal; signal?: AbortSignal;
log: FastifyBaseLogger; log: FastifyBaseLogger;
/** Optional live event sink. When set, stdout is line-buffered + NDJSON-parsed
* and each AgentEvent is forwarded here as it arrives. Absent → opaque (old)
* behavior: stdout is accumulated and returned, no parsing. */
onEvent?: (e: AgentEvent) => void;
} }
interface PtySpawnSpec { interface PtySpawnSpec {
@@ -40,7 +60,9 @@ function buildPtySpawnSpec(
switch (agent) { switch (agent) {
case 'claude': { case 'claude': {
const args = ['-p']; // stream-json on -p requires --verbose (Claude Code rejects stream-json
// print mode without it). qwen needs no such flag.
const args = ['-p', '--output-format', 'stream-json', '--verbose'];
if (model) args.push('--model', model); if (model) args.push('--model', model);
if (modeId) args.push('--permission-mode', modeId); if (modeId) args.push('--permission-mode', modeId);
if (thinkingOptionId) args.push('--effort', thinkingOptionId); if (thinkingOptionId) args.push('--effort', thinkingOptionId);
@@ -73,7 +95,7 @@ function buildPtySpawnSpec(
} }
export async function dispatchViaPty(opts: PtyDispatchOpts): Promise<DispatchResult> { export async function dispatchViaPty(opts: PtyDispatchOpts): Promise<DispatchResult> {
const { agent, task, worktreePath, model, modeId, thinkingOptionId, installPath, signal, log } = opts; const { agent, task, worktreePath, model, modeId, thinkingOptionId, installPath, signal, log, onEvent } = opts;
const cmd = buildPtySpawnSpec(agent, task, model, modeId, thinkingOptionId, installPath); const cmd = buildPtySpawnSpec(agent, task, model, modeId, thinkingOptionId, installPath);
if (!cmd) { if (!cmd) {
@@ -81,6 +103,7 @@ export async function dispatchViaPty(opts: PtyDispatchOpts): Promise<DispatchRes
exitCode: 1, exitCode: 1,
stdout: '', stdout: '',
stderr: `Agent '${agent}' is not yet supported for PTY dispatch.`, stderr: `Agent '${agent}' is not yet supported for PTY dispatch.`,
streamed: false,
}; };
} }
@@ -102,7 +125,32 @@ export async function dispatchViaPty(opts: PtyDispatchOpts): Promise<DispatchRes
let stderr = ''; let stderr = '';
let killed = false; let killed = false;
child.stdout!.on('data', (chunk: Buffer) => { stdout += chunk.toString(); }); // Live NDJSON parsing (only when a sink is supplied). Line-buffer: split on
// '\n', dispatch complete lines, hold the partial tail until the next chunk.
const parser = onEvent ? makeStreamJsonParser() : null;
let lineBuf = '';
let streamed = false;
const feedLine = (line: string): void => {
if (!parser || !onEvent) return;
for (const e of parser.push(line)) {
streamed = true;
onEvent(e);
}
};
child.stdout!.on('data', (chunk: Buffer) => {
const text = chunk.toString();
stdout += text;
if (!parser) return;
lineBuf += text;
let nl = lineBuf.indexOf('\n');
while (nl !== -1) {
const line = lineBuf.slice(0, nl);
lineBuf = lineBuf.slice(nl + 1);
feedLine(line);
nl = lineBuf.indexOf('\n');
}
});
child.stderr!.on('data', (chunk: Buffer) => { stderr += chunk.toString(); }); child.stderr!.on('data', (chunk: Buffer) => { stderr += chunk.toString(); });
const cleanup = () => { const cleanup = () => {
@@ -116,7 +164,7 @@ export async function dispatchViaPty(opts: PtyDispatchOpts): Promise<DispatchRes
if (signal) { if (signal) {
if (signal.aborted) { if (signal.aborted) {
cleanup(); cleanup();
resolve({ exitCode: 130, stdout: '', stderr: 'Aborted before start' }); resolve({ exitCode: 130, stdout: '', stderr: 'Aborted before start', streamed: false });
return; return;
} }
signal.addEventListener('abort', cleanup, { once: true }); signal.addEventListener('abort', cleanup, { once: true });
@@ -124,8 +172,18 @@ export async function dispatchViaPty(opts: PtyDispatchOpts): Promise<DispatchRes
child.on('close', (code) => { child.on('close', (code) => {
if (signal) signal.removeEventListener('abort', cleanup); if (signal) signal.removeEventListener('abort', cleanup);
log.info({ agent, exitCode: code }, 'pty-dispatch: completed'); // Flush any final line with no trailing newline.
resolve({ exitCode: code ?? 1, stdout, stderr }); if (lineBuf.trim()) feedLine(lineBuf);
lineBuf = '';
log.info({ agent, exitCode: code, streamed }, 'pty-dispatch: completed');
resolve({
exitCode: code ?? 1,
stdout,
stderr,
streamed,
usage: parser?.usage(),
agentSessionId: parser?.sessionId() ?? null,
});
}); });
child.on('error', (err) => { child.on('error', (err) => {

View File

@@ -0,0 +1,296 @@
/**
* Claude-Code-compatible stream-json NDJSON parser (feature #7,
* openspec `sampling-streamjson-tokens`).
*
* qwen (`--output-format stream-json`) and claude (`--output-format stream-json`)
* both emit Claude-Code's stream-json NDJSON on stdout: one JSON object per line.
* This module turns that stream into the same transport-agnostic `AgentEvent`s the
* ACP / opencode-server backends emit, so the PTY dispatch path can publish live
* broker frames + persist structured parts instead of slicing stdout opaque.
*
* Two surfaces:
* - `parseStreamJsonLine(line, state)` — PURE per-line mapping (unit-testable).
* `state` is the caller-owned accumulator (open tool blocks + usage/session_id).
* - `makeStreamJsonParser()` — a thin stateful wrapper holding the state, with a
* `push(line)` that returns the events for that line and getters for the final
* `usage` / `sessionId`.
*
* Defensive by contract: a non-JSON / partial / garbage line yields `[]` and never
* throws. Tool args (`input_json_delta`) arrive fragmented across many lines; we
* accumulate the partial JSON string per content-block index and only surface the
* parsed `rawInput` once the block stops (or, as a fallback, off the terminal
* `assistant` message which carries the fully-assembled `tool_use` blocks).
*
* Schema (keyed on top-level `type`):
* - `system` — init: { session_id, tools, ... }
* - `assistant` — { message: { content: [ {type:'text'|'thinking'|'tool_use', ...} ], usage? } }
* - `user` — tool results (ignored — diffing the worktree captures effects)
* - `result` — final: { usage: { input_tokens, output_tokens }, session_id? }
* - `stream_event` — { event: { type, index?, content_block?, delta?, usage? } }
* event.type:
* content_block_start — { index, content_block: {type, id?, name?} }
* content_block_delta — { index, delta: {type, text?|thinking?|partial_json?} }
* content_block_stop — { index }
* message_delta — { usage: { output_tokens } }
* message_start — { message: { usage } }
*/
import type { AgentEvent } from './agent-backend.js';
import type { AcpToolSnapshot } from './acp-tool-snapshot.js';
/** Convenience alias for the per-line return value. */
export type AgentEventList = AgentEvent[];
export interface StreamJsonUsage {
inputTokens?: number;
outputTokens?: number;
}
/** Per-open-content-block accumulation for tool args assembled across deltas. */
interface OpenToolBlock {
toolCallId: string;
name: string;
/** Concatenated `input_json_delta.partial_json` fragments. */
partialJson: string;
}
export interface StreamJsonState {
/** content-block index → open tool block (only `tool_use` blocks are tracked). */
toolBlocks: Map<number, OpenToolBlock>;
sessionId: string | null;
usage: StreamJsonUsage;
}
export function makeStreamJsonState(): StreamJsonState {
return { toolBlocks: new Map(), sessionId: null, usage: {} };
}
function asRecord(value: unknown): Record<string, unknown> | null {
if (value && typeof value === 'object' && !Array.isArray(value)) {
return value as Record<string, unknown>;
}
return null;
}
function asString(value: unknown): string | undefined {
return typeof value === 'string' ? value : undefined;
}
function asNumber(value: unknown): number | undefined {
return typeof value === 'number' && Number.isFinite(value) ? value : undefined;
}
/** Pull token counts out of an Anthropic-shape `usage` object, mutating state. */
function captureUsage(usage: Record<string, unknown> | null, state: StreamJsonState): void {
if (!usage) return;
const input = asNumber(usage.input_tokens);
const output = asNumber(usage.output_tokens);
if (input !== undefined) state.usage.inputTokens = input;
// output_tokens is reported incrementally on message_delta; keep the latest.
if (output !== undefined) state.usage.outputTokens = output;
}
/** Parse the accumulated tool-arg JSON; tolerate an unparseable/partial body. */
function parseToolInput(partialJson: string): unknown {
const trimmed = partialJson.trim();
if (!trimmed) return {};
try {
return JSON.parse(trimmed);
} catch {
return { _raw: partialJson };
}
}
function toolSnapshot(block: OpenToolBlock, rawInput: unknown, status: AcpToolSnapshot['status']): AcpToolSnapshot {
return {
toolCallId: block.toolCallId,
title: block.name,
kind: null,
status,
rawInput,
};
}
/**
* Map one stream-event sub-object (the `event` field of a `stream_event` line) to
* AgentEvents, mutating `state` for open tool blocks + usage.
*/
function handleStreamEvent(event: Record<string, unknown>, state: StreamJsonState): AgentEvent[] {
const eventType = asString(event.type);
if (!eventType) return [];
switch (eventType) {
case 'content_block_start': {
const index = asNumber(event.index);
const block = asRecord(event.content_block);
if (index === undefined || !block) return [];
if (asString(block.type) !== 'tool_use') return [];
const toolCallId = asString(block.id) ?? `tool_${index}`;
const name = asString(block.name) ?? 'tool';
const open: OpenToolBlock = { toolCallId, name, partialJson: '' };
state.toolBlocks.set(index, open);
// Surface the tool start immediately (running, no args yet) so the UI shows
// the call before the args finish streaming.
return [{ type: 'tool_call', toolCall: toolSnapshot(open, {}, 'in_progress') }];
}
case 'content_block_delta': {
const index = asNumber(event.index);
const delta = asRecord(event.delta);
if (delta === null) return [];
const deltaType = asString(delta.type);
if (deltaType === 'text_delta') {
const text = asString(delta.text);
return text ? [{ type: 'text', text }] : [];
}
if (deltaType === 'thinking_delta') {
const text = asString(delta.thinking);
return text ? [{ type: 'reasoning', text }] : [];
}
if (deltaType === 'input_json_delta') {
// Accumulate tool args; no event until the block stops.
const fragment = asString(delta.partial_json);
if (index !== undefined && fragment) {
const open = state.toolBlocks.get(index);
if (open) open.partialJson += fragment;
}
return [];
}
return [];
}
case 'content_block_stop': {
const index = asNumber(event.index);
if (index === undefined) return [];
const open = state.toolBlocks.get(index);
if (!open) return [];
state.toolBlocks.delete(index);
const rawInput = parseToolInput(open.partialJson);
return [{ type: 'tool_update', toolCall: toolSnapshot(open, rawInput, 'completed') }];
}
case 'message_start': {
const message = asRecord(event.message);
captureUsage(asRecord(message?.usage), state);
return [];
}
case 'message_delta': {
captureUsage(asRecord(event.usage), state);
return [];
}
default:
return [];
}
}
/**
* Map the terminal `assistant` message (post-hoc full message) to AgentEvents. Used
* as a fallback for transports that emit only the assembled `assistant` line and no
* incremental `stream_event`s. When stream_events already streamed a block, the
* caller dedups by toolCallId, so re-emitting the assembled tool_use is harmless.
*/
function handleAssistantMessage(message: Record<string, unknown>, state: StreamJsonState): AgentEvent[] {
captureUsage(asRecord(message.usage), state);
const content = message.content;
if (!Array.isArray(content)) return [];
const out: AgentEvent[] = [];
let toolIdx = 0;
for (const rawBlock of content) {
const block = asRecord(rawBlock);
if (!block) continue;
const blockType = asString(block.type);
if (blockType === 'text') {
const text = asString(block.text);
if (text) out.push({ type: 'text', text });
} else if (blockType === 'thinking') {
const text = asString(block.thinking);
if (text) out.push({ type: 'reasoning', text });
} else if (blockType === 'tool_use') {
const toolCallId = asString(block.id) ?? `tool_${toolIdx}`;
const name = asString(block.name) ?? 'tool';
const rawInput = 'input' in block ? block.input : {};
out.push({
type: 'tool_update',
toolCall: { toolCallId, title: name, kind: null, status: 'completed', rawInput },
});
}
toolIdx++;
}
return out;
}
/**
* Pure per-line mapping. `line` is a single complete NDJSON line (no trailing
* newline required; surrounding whitespace tolerated). Returns the AgentEvents the
* line produces and mutates `state` (open tool blocks, usage, session_id). A blank,
* non-JSON, or unrecognized line yields `[]` and never throws.
*/
export function parseStreamJsonLine(line: string, state: StreamJsonState): AgentEvent[] {
const trimmed = line.trim();
if (!trimmed) return [];
let obj: Record<string, unknown> | null;
try {
const parsed: unknown = JSON.parse(trimmed);
obj = asRecord(parsed);
} catch {
return [];
}
if (!obj) return [];
const type = asString(obj.type);
switch (type) {
case 'system': {
const sid = asString(obj.session_id);
if (sid) state.sessionId = sid;
return [];
}
case 'stream_event': {
const event = asRecord(obj.event);
return event ? handleStreamEvent(event, state) : [];
}
case 'assistant': {
const sid = asString(obj.session_id);
if (sid) state.sessionId = sid;
const message = asRecord(obj.message);
return message ? handleAssistantMessage(message, state) : [];
}
case 'result': {
const sid = asString(obj.session_id);
if (sid) state.sessionId = sid;
captureUsage(asRecord(obj.usage), state);
return [];
}
default:
// `user` (tool results) and any unknown line type — ignore.
return [];
}
}
export interface StreamJsonParser {
/** Feed one complete NDJSON line; returns its AgentEvents (never throws). */
push(line: string): AgentEvent[];
/** Final usage (input/output tokens) accumulated so far. */
usage(): StreamJsonUsage;
/** Provider session id from the init `system` line / `result`, if seen. */
sessionId(): string | null;
}
/**
* Stateful wrapper around `parseStreamJsonLine`. Holds per-tool-block accumulation
* + usage/session_id across the turn. Line-buffering (splitting stdout on `\n` and
* holding the partial tail) is the caller's job — see `pty-dispatch.ts`.
*/
export function makeStreamJsonParser(): StreamJsonParser {
const state = makeStreamJsonState();
return {
push: (line: string) => parseStreamJsonLine(line, state),
usage: () => ({ ...state.usage }),
sessionId: () => state.sessionId,
};
}

View File

@@ -1,4 +1,4 @@
import { describe, it, expect } from 'vitest'; import { describe, it, expect, vi, afterEach } from 'vitest';
import { isAgentRegistryMarkdown, parseAgentsMd } from '../agents.js'; import { isAgentRegistryMarkdown, parseAgentsMd } from '../agents.js';
describe('isAgentRegistryMarkdown', () => { describe('isAgentRegistryMarkdown', () => {
@@ -31,3 +31,87 @@ Start here
expect(r.errors.length).toBeGreaterThan(0); expect(r.errors.length).toBeGreaterThan(0);
}); });
}); });
// v2.6 sampling-streamjson-tokens (#11): per-agent llama.cpp sampler extensions.
describe('parseAgentsMd: v2.6 sampling knobs', () => {
afterEach(() => {
vi.restoreAllMocks();
});
const withFrontmatter = (lines: string) => `# Agents
## Sampler
---
temperature: 0.6
${lines}
tools: [view_file]
description: test
---
You sample.
`;
it('parses top_n_sigma and the dry_* family from frontmatter', () => {
const md = withFrontmatter(
[
'top_n_sigma: 1.5',
'dry_multiplier: 0.8',
'dry_base: 1.75',
'dry_allowed_length: 2',
'dry_penalty_last_n: -1',
].join('\n'),
);
const { agents, errors } = parseAgentsMd(md);
expect(errors).toHaveLength(0);
expect(agents).toHaveLength(1);
const a = agents[0]!;
expect(a.top_n_sigma).toBe(1.5);
expect(a.dry_multiplier).toBe(0.8);
expect(a.dry_base).toBe(1.75);
expect(a.dry_allowed_length).toBe(2);
expect(a.dry_penalty_last_n).toBe(-1);
});
it('defaults the new sampler fields to null when omitted', () => {
const { agents } = parseAgentsMd(withFrontmatter('top_p: 0.95'));
const a = agents[0]!;
expect(a.top_n_sigma).toBeNull();
expect(a.dry_multiplier).toBeNull();
expect(a.dry_base).toBeNull();
expect(a.dry_allowed_length).toBeNull();
expect(a.dry_penalty_last_n).toBeNull();
});
it('warns (does not error) on out-of-range top_n_sigma / dry_* values', () => {
const warn = vi.spyOn(console, 'warn').mockImplementation(() => {});
const md = withFrontmatter(
[
'top_n_sigma: -1',
'dry_multiplier: -0.5',
'dry_base: -2',
'dry_allowed_length: -3',
'dry_penalty_last_n: -5',
].join('\n'),
);
const { agents, errors } = parseAgentsMd(md);
expect(errors).toHaveLength(0);
expect(agents).toHaveLength(1);
// Mirrors top_k/min_p: out-of-range still stored, with a warning.
expect(warn).toHaveBeenCalled();
const warnings = warn.mock.calls.map((c) => String(c[0])).join('\n');
expect(warnings).toContain('top_n_sigma');
expect(warnings).toContain('dry_multiplier');
expect(warnings).toContain('dry_base');
expect(warnings).toContain('dry_allowed_length');
expect(warnings).toContain('dry_penalty_last_n');
});
it('errors on non-numeric / non-integer sampler values', () => {
const md = withFrontmatter(
['top_n_sigma: high', 'dry_allowed_length: 2.5'].join('\n'),
);
const { errors } = parseAgentsMd(md);
const joined = errors.map((e) => e.reason).join('\n');
expect(joined).toContain('top_n_sigma must be a number');
expect(joined).toContain('dry_allowed_length must be an integer');
});
});

View File

@@ -7,6 +7,8 @@ import {
select, select,
buildPrompt, buildPrompt,
buildHeadPayload, buildHeadPayload,
deriveFilesRead,
buildFilesReadContext,
type CompactionMessage, type CompactionMessage,
} from '../compaction.js'; } from '../compaction.js';
import { SUMMARY_TEMPLATE } from '../compaction-prompt.js'; import { SUMMARY_TEMPLATE } from '../compaction-prompt.js';
@@ -321,3 +323,105 @@ describe('buildHeadPayload reasoning render', () => {
expect(out[1]!.content).not.toContain('<reasoning>'); expect(out[1]!.content).not.toContain('<reasoning>');
}); });
}); });
// ---- buildHeadPayload sentinel stripping (#12) -------------------------------
describe('buildHeadPayload strips all UI sentinels', () => {
it('drops cap_hit, doom_loop, and mistake_recovery system rows', () => {
const out = buildHeadPayload([
mkMsg('user', 'do the thing'),
mkMsg('system', 'budget reached', { metadata: { kind: 'cap_hit' } }),
mkMsg('system', 'looping', { metadata: { kind: 'doom_loop' } }),
mkMsg('system', 'repeated errors', { metadata: { kind: 'mistake_recovery' } }),
mkMsg('assistant', 'answer'),
]);
// Only the user + assistant rows survive; all three sentinels stripped.
expect(out).toHaveLength(2);
expect(out[0]!.role).toBe('user');
expect(out[1]!.role).toBe('assistant');
});
it('keeps a non-sentinel system row (e.g. compact bridge) untouched', () => {
const out = buildHeadPayload([
mkMsg('system', 'legacy compact', { kind: 'compact', metadata: null }),
mkMsg('user', 'q'),
]);
expect(out[0]!.role).toBe('system');
expect(out[0]!.content).toBe('legacy compact');
});
});
// ---- file-provenance ledger (#12, Part B) -----------------------------------
describe('deriveFilesRead', () => {
it('returns [] when the head has no read-tool calls', () => {
expect(deriveFilesRead([mkMsg('user', 'hi'), mkMsg('assistant', 'hello')])).toEqual([]);
});
it('extracts the path arg from view_file / list_dir / grep / find_files', () => {
const head = [
mkMsg('assistant', '', {
tool_calls: [
{ id: 'c1', name: 'view_file', args: { path: 'src/index.ts' } },
{ id: 'c2', name: 'list_dir', args: { path: 'src' } },
{ id: 'c3', name: 'grep', args: { pattern: 'TODO', path: 'apps' } },
{ id: 'c4', name: 'find_files', args: { pattern: '**/*.ts', path: 'lib' } },
],
}),
];
expect(deriveFilesRead(head)).toEqual(['apps', 'lib', 'src', 'src/index.ts']);
});
it('dedupes and sorts paths across multiple assistant turns', () => {
const head = [
mkMsg('assistant', '', { tool_calls: [{ id: 'c1', name: 'view_file', args: { path: 'b.ts' } }] }),
mkMsg('assistant', '', { tool_calls: [{ id: 'c2', name: 'view_file', args: { path: 'a.ts' } }] }),
mkMsg('assistant', '', { tool_calls: [{ id: 'c3', name: 'view_file', args: { path: 'b.ts' } }] }),
];
expect(deriveFilesRead(head)).toEqual(['a.ts', 'b.ts']);
});
it('ignores non-read tools and grep calls without a path arg', () => {
const head = [
mkMsg('assistant', '', {
tool_calls: [
{ id: 'c1', name: 'web_search', args: { query: 'x' } },
{ id: 'c2', name: 'grep', args: { pattern: 'foo' } }, // no path → root, skipped
{ id: 'c3', name: 'view_file', args: { path: 'kept.ts' } },
],
}),
];
expect(deriveFilesRead(head)).toEqual(['kept.ts']);
});
it('ignores read-tool calls on non-assistant rows', () => {
const head = [
mkMsg('user', '', { tool_calls: [{ id: 'c1', name: 'view_file', args: { path: 'nope.ts' } }] }),
];
expect(deriveFilesRead(head)).toEqual([]);
});
});
describe('buildFilesReadContext', () => {
it('returns null when nothing was read (no empty section injected)', () => {
expect(buildFilesReadContext([mkMsg('user', 'hi')])).toBeNull();
});
it('formats a ## Files Read block with sorted bullet paths', () => {
const head = [
mkMsg('assistant', '', {
tool_calls: [
{ id: 'c1', name: 'view_file', args: { path: 'z.ts' } },
{ id: 'c2', name: 'view_file', args: { path: 'a.ts' } },
],
}),
];
expect(buildFilesReadContext(head)).toBe('## Files Read\n- a.ts\n- z.ts');
});
});
describe('SUMMARY_TEMPLATE includes the Files Read section (#12)', () => {
it('declares a ## Files Read section the model must maintain', () => {
expect(SUMMARY_TEMPLATE).toContain('## Files Read');
});
});

View File

@@ -0,0 +1,164 @@
import { describe, it, expect } from 'vitest';
import {
MISTAKE_THRESHOLD,
freshMistakeState,
recordStep,
detectMistakePattern,
MISTAKE_RECOVERY_NOTE,
type FailureKind,
} from '../inference/mistake-tracker.js';
// ---- helpers ----------------------------------------------------------------
// Replays a sequence of outcomes against a fresh state, returning the final
// state so assertions can read .run / .nudges. The caller mimics turn.ts: after
// each recordStep we consult detectMistakePattern and, if it returns 'nudge',
// bump nudges + reset run (the loop's nudge-handling side effect).
function replay(
outcomes: (FailureKind | 'success')[],
{ applyNudge = false }: { applyNudge?: boolean } = {},
) {
const state = freshMistakeState();
const decisions: (ReturnType<typeof detectMistakePattern>)[] = [];
for (const o of outcomes) {
recordStep(state, o);
const decision = detectMistakePattern(state);
decisions.push(decision);
if (applyNudge && decision === 'nudge') {
// Mirror turn.ts's nudge side effect: bump the counter, reset the streak.
state.nudges += 1;
state.run = [];
}
}
return { state, decisions };
}
// ---- fresh state ------------------------------------------------------------
describe('freshMistakeState', () => {
it('starts with an empty run and zero nudges', () => {
const s = freshMistakeState();
expect(s.run).toEqual([]);
expect(s.nudges).toBe(0);
});
});
// ---- below threshold --------------------------------------------------------
describe('detectMistakePattern — below threshold', () => {
it('returns null on a fresh state', () => {
expect(detectMistakePattern(freshMistakeState())).toBeNull();
});
it('returns null after fewer than MISTAKE_THRESHOLD failures', () => {
const { decisions } = replay(['zod_reject', 'exec_error']);
expect(decisions).toEqual([null, null]);
});
});
// ---- success reset ----------------------------------------------------------
describe('recordStep — success resets', () => {
it("'success' clears both the run streak and the nudge counter", () => {
const state = freshMistakeState();
recordStep(state, 'zod_reject');
recordStep(state, 'exec_error');
state.nudges = 2; // simulate prior nudges
recordStep(state, 'success');
expect(state.run).toEqual([]);
expect(state.nudges).toBe(0);
});
it('a success mid-streak prevents the threshold from tripping', () => {
// fail, fail, success, fail, fail → streak never reaches 3.
const { decisions } = replay([
'zod_reject',
'exec_error',
'success',
'tool_not_found',
'permission_denied',
]);
expect(decisions.every((d) => d === null)).toBe(true);
});
});
// ---- 3-streak nudge ---------------------------------------------------------
describe('detectMistakePattern — nudge on 3-streak', () => {
it("returns 'nudge' the first time the streak reaches MISTAKE_THRESHOLD", () => {
const { decisions } = replay(['zod_reject', 'exec_error', 'tool_not_found']);
expect(decisions).toEqual([null, null, 'nudge']);
});
it("fires 'nudge' for a streak of identical kinds too (kind-agnostic)", () => {
const { decisions } = replay(['exec_error', 'exec_error', 'exec_error']);
expect(decisions[2]).toBe('nudge');
});
});
// ---- re-trip escalate -------------------------------------------------------
describe('detectMistakePattern — escalate on re-trip', () => {
it("escalates when the streak re-trips after a nudge with no intervening success", () => {
// 3 fails → nudge (run reset, nudges=1), then 3 more fails → escalate.
const { decisions } = replay(
[
'zod_reject',
'exec_error',
'tool_not_found',
'permission_denied',
'exec_error',
'zod_reject',
],
{ applyNudge: true },
);
expect(decisions[2]).toBe('nudge');
expect(decisions[5]).toBe('escalate');
});
it("does NOT escalate if a success lands between the nudge and the next streak", () => {
const { decisions } = replay(
[
'zod_reject',
'exec_error',
'tool_not_found', // nudge here
'success', // clears nudges back to 0
'exec_error',
'zod_reject',
'tool_not_found', // 3-streak again → nudge, NOT escalate
],
{ applyNudge: true },
);
expect(decisions[2]).toBe('nudge');
expect(decisions[6]).toBe('nudge');
expect(decisions).not.toContain('escalate');
});
});
// ---- mixed kinds ------------------------------------------------------------
describe('detectMistakePattern — mixed failure kinds', () => {
it('counts a streak of all five distinct kinds toward the threshold', () => {
const { state, decisions } = replay([
'zod_reject',
'tool_not_found',
'exec_error',
]);
expect(decisions[2]).toBe('nudge');
expect(state.run).toEqual(['zod_reject', 'tool_not_found', 'exec_error']);
});
});
// ---- contract ---------------------------------------------------------------
describe('MISTAKE_THRESHOLD + MISTAKE_RECOVERY_NOTE', () => {
it('threshold is a positive integer (tests assume 3)', () => {
expect(MISTAKE_THRESHOLD).toBeGreaterThan(0);
expect(Number.isInteger(MISTAKE_THRESHOLD)).toBe(true);
});
it('recovery note is a non-empty model-facing string', () => {
expect(typeof MISTAKE_RECOVERY_NOTE).toBe('string');
expect(MISTAKE_RECOVERY_NOTE.length).toBeGreaterThan(0);
});
});

View File

@@ -88,6 +88,12 @@ interface ParsedFrontmatter {
top_k?: number; top_k?: number;
min_p?: number; min_p?: number;
presence_penalty?: number; presence_penalty?: number;
// v2.6 sampling-streamjson-tokens (#11): llama.cpp sampler extensions.
top_n_sigma?: number;
dry_multiplier?: number;
dry_base?: number;
dry_allowed_length?: number;
dry_penalty_last_n?: number;
tools?: string[]; tools?: string[];
description?: string; description?: string;
model?: string; model?: string;
@@ -178,6 +184,63 @@ function parseFrontmatter(yaml: string): { data: ParsedFrontmatter; errors: stri
} else { } else {
errors.push(`presence_penalty must be a number (got "${valueRaw}")`); errors.push(`presence_penalty must be a number (got "${valueRaw}")`);
} }
} else if (key === 'top_n_sigma') {
// v2.6 #11: llama.cpp top-n-sigma sampler. Float ≥ 0 (typical 0-3).
// Mirrors top_p/min_p: store then warn on out-of-range (non-numeric
// hard-fails the block).
const n = Number(valueRaw);
if (Number.isFinite(n)) {
data.top_n_sigma = n;
if (n < 0) {
console.warn(`agents: top_n_sigma ${n} out of range (≥0), ignoring (falling back to default)`);
}
} else {
errors.push(`top_n_sigma must be a number (got "${valueRaw}")`);
}
} else if (key === 'dry_multiplier') {
// v2.6 #11: DRY repetition-penalty multiplier. Float ≥ 0 (0 disables DRY).
const n = Number(valueRaw);
if (Number.isFinite(n)) {
data.dry_multiplier = n;
if (n < 0) {
console.warn(`agents: dry_multiplier ${n} out of range (≥0), ignoring (falling back to default)`);
}
} else {
errors.push(`dry_multiplier must be a number (got "${valueRaw}")`);
}
} else if (key === 'dry_base') {
// v2.6 #11: DRY penalty growth base. Float ≥ 0.
const n = Number(valueRaw);
if (Number.isFinite(n)) {
data.dry_base = n;
if (n < 0) {
console.warn(`agents: dry_base ${n} out of range (≥0), ignoring (falling back to default)`);
}
} else {
errors.push(`dry_base must be a number (got "${valueRaw}")`);
}
} else if (key === 'dry_allowed_length') {
// v2.6 #11: DRY max sequence length not penalized. Integer ≥ 0.
const n = Number(valueRaw);
if (Number.isInteger(n)) {
data.dry_allowed_length = n;
if (n < 0) {
console.warn(`agents: dry_allowed_length ${n} out of range (≥0), ignoring (falling back to default)`);
}
} else {
errors.push(`dry_allowed_length must be an integer (got "${valueRaw}")`);
}
} else if (key === 'dry_penalty_last_n') {
// v2.6 #11: DRY lookback window. Integer ≥ -1 (-1 = whole context, 0 = off).
const n = Number(valueRaw);
if (Number.isInteger(n)) {
data.dry_penalty_last_n = n;
if (n < -1) {
console.warn(`agents: dry_penalty_last_n ${n} out of range (≥-1), ignoring (falling back to default)`);
}
} else {
errors.push(`dry_penalty_last_n must be an integer (got "${valueRaw}")`);
}
} else if (key === 'tools') { } else if (key === 'tools') {
if (valueRaw === '') { if (valueRaw === '') {
data.tools = []; data.tools = [];
@@ -354,6 +417,11 @@ function parseAgentSection(section: RawSection): Omit<Agent, 'source'> {
top_k: typeof fm.top_k === 'number' ? fm.top_k : null, top_k: typeof fm.top_k === 'number' ? fm.top_k : null,
min_p: typeof fm.min_p === 'number' ? fm.min_p : null, min_p: typeof fm.min_p === 'number' ? fm.min_p : null,
presence_penalty: typeof fm.presence_penalty === 'number' ? fm.presence_penalty : null, presence_penalty: typeof fm.presence_penalty === 'number' ? fm.presence_penalty : null,
top_n_sigma: typeof fm.top_n_sigma === 'number' ? fm.top_n_sigma : null,
dry_multiplier: typeof fm.dry_multiplier === 'number' ? fm.dry_multiplier : null,
dry_base: typeof fm.dry_base === 'number' ? fm.dry_base : null,
dry_allowed_length: typeof fm.dry_allowed_length === 'number' ? fm.dry_allowed_length : null,
dry_penalty_last_n: typeof fm.dry_penalty_last_n === 'number' ? fm.dry_penalty_last_n : null,
tools: filteredTools, tools: filteredTools,
model: typeof fm.model === 'string' && fm.model.length > 0 ? fm.model : null, model: typeof fm.model === 'string' && fm.model.length > 0 ? fm.model : null,
max_tool_calls: typeof fm.max_tool_calls === 'number' ? fm.max_tool_calls : null, max_tool_calls: typeof fm.max_tool_calls === 'number' ? fm.max_tool_calls : null,

View File

@@ -31,10 +31,16 @@ export const SUMMARY_TEMPLATE = `Output exactly the Markdown structure shown ins
## Relevant Files ## Relevant Files
- [file or directory path: why it matters, or "(none)"] - [file or directory path: why it matters, or "(none)"]
## Files Read
- [file or directory path that has been read/searched this session, or "(none)"]
</template> </template>
Rules: Rules:
- Keep every section, even when empty. - Keep every section, even when empty.
- Use terse bullets, not prose paragraphs. - Use terse bullets, not prose paragraphs.
- Preserve exact file paths, commands, error strings, and identifiers when known. - Preserve exact file paths, commands, error strings, and identifiers when known.
- For ## Files Read: this is a cumulative provenance ledger. MERGE the paths
listed in any "## Files Read" block provided below with those already in the
previous summary — never drop a previously-recorded path. Sort and dedupe.
- Do not mention the summary process or that context was compacted.`; - Do not mention the summary process or that context was compacted.`;

View File

@@ -181,6 +181,54 @@ export function select(
}; };
} }
// === file-provenance ledger (#12, Part B) ===
// Read tools whose path/target arg names a file or directory that was read.
// BooChat (apps/server) is read-only — there are no write tools, so the ledger
// only ever has a "Files Read" side (apps/coder can add "Modified" later).
const READ_TOOL_ARG: Record<string, string> = {
view_file: 'path',
list_dir: 'path',
grep: 'path',
find_files: 'path',
};
// Derive a deterministic, deduped, sorted list of file/dir paths read by the
// HEAD messages being summarized. Pure — scans assistant tool_calls only; the
// boundary (which messages are "head") is decided by select() at the call site.
// We derive at compaction time rather than via a live accumulator because
// TurnArgs resets per turn and would miss reads on non-compacting turns; the
// head messages are the authoritative record of what was read in the window
// being summarized. The result propagates forward as summary text across
// compactions (the LLM merges it into ## Files Read), so a path read long ago
// survives even after its originating messages are compacted out.
export function deriveFilesRead(head: CompactionMessage[]): string[] {
const paths = new Set<string>();
for (const m of head) {
if (m.role !== 'assistant') continue;
if (!m.tool_calls) continue;
for (const tc of m.tool_calls) {
const argName = READ_TOOL_ARG[tc.name];
if (!argName) continue;
const raw = (tc.args as Record<string, unknown> | null)?.[argName];
if (typeof raw === 'string' && raw.trim().length > 0) {
paths.add(raw.trim());
}
}
}
return [...paths].sort();
}
// Format the derived paths as a deterministic ## Files Read block for injection
// into buildPrompt's context array. Returns null when nothing was read (so we
// don't inject an empty section). The summarizer merges this into the rolling
// summary's ## Files Read section per the SUMMARY_TEMPLATE instructions.
export function buildFilesReadContext(head: CompactionMessage[]): string | null {
const paths = deriveFilesRead(head);
if (paths.length === 0) return null;
return ['## Files Read', ...paths.map((p) => `- ${p}`)].join('\n');
}
// === prompt assembly === // === prompt assembly ===
// Build the final user message that asks the model to (re)produce the // Build the final user message that asks the model to (re)produce the
@@ -220,15 +268,26 @@ export interface OpenAiMessage {
tool_call_id?: string; tool_call_id?: string;
} }
function isCapHitSentinel(m: CompactionMessage): boolean { // #12: mirror inference/sentinels.ts:isAnySentinel over the CompactionMessage
return m.role === 'system' && m.metadata != null && m.metadata.kind === 'cap_hit'; // shape (which carries metadata as { kind?: string } | null, not the full
// Message type isAnySentinel expects). All UI-only sentinels are stripped from
// the head payload — they never go to the summarizer LLM. Keep the kind list in
// sync with isAnySentinel in sentinels.ts.
const SENTINEL_KINDS = new Set(['cap_hit', 'doom_loop', 'mistake_recovery']);
function isAnySentinel(m: CompactionMessage): boolean {
return (
m.role === 'system' &&
m.metadata != null &&
typeof m.metadata.kind === 'string' &&
SENTINEL_KINDS.has(m.metadata.kind)
);
} }
// v1.13.6: exported for unit-test access (reasoning render coverage). // v1.13.6: exported for unit-test access (reasoning render coverage).
export function buildHeadPayload(head: CompactionMessage[]): OpenAiMessage[] { export function buildHeadPayload(head: CompactionMessage[]): OpenAiMessage[] {
const out: OpenAiMessage[] = []; const out: OpenAiMessage[] = [];
for (const m of head) { for (const m of head) {
if (isCapHitSentinel(m)) continue; if (isAnySentinel(m)) continue;
if (m.role === 'assistant' && (m.status === 'streaming' || m.status === 'cancelled')) continue; if (m.role === 'assistant' && (m.status === 'streaming' || m.status === 'cancelled')) continue;
if (m.kind === 'compact') { if (m.kind === 'compact') {
// Legacy compact row — pass through as system context. The new // Legacy compact row — pass through as system context. The new
@@ -417,7 +476,14 @@ export async function process(input: ProcessInput): Promise<void> {
// user message carrying buildPrompt(previousSummary, []). No system prompt // user message carrying buildPrompt(previousSummary, []). No system prompt
// — matches opencode (`system: []`); the template + anchor are sufficient. // — matches opencode (`system: []`); the template + anchor are sufficient.
const headPayload = buildHeadPayload(sel.head); const headPayload = buildHeadPayload(sel.head);
const finalUser: OpenAiMessage = { role: 'user', content: buildPrompt(previousSummary, []) }; // #12 Part B: derive the file-provenance ledger from the head's read-tool
// calls and inject it as a deterministic ## Files Read context block so the
// summarizer merges it into the rolling summary. Empty → no injection.
const filesReadCtx = buildFilesReadContext(sel.head);
const finalUser: OpenAiMessage = {
role: 'user',
content: buildPrompt(previousSummary, filesReadCtx ? [filesReadCtx] : []),
};
const payload = [...headPayload, finalUser]; const payload = [...headPayload, finalUser];
log.info( log.info(

View File

@@ -19,6 +19,14 @@ export type {
} from './turn.js'; } from './turn.js';
export type { ToolPhaseResult } from './tool-phase.js'; export type { ToolPhaseResult } from './tool-phase.js';
export { detectDoomLoop, DOOM_LOOP_THRESHOLD } from './sentinels.js'; export { detectDoomLoop, DOOM_LOOP_THRESHOLD } from './sentinels.js';
export {
detectMistakePattern,
freshMistakeState,
recordStep,
MISTAKE_THRESHOLD,
MISTAKE_RECOVERY_NOTE,
} from './mistake-tracker.js';
export type { FailureKind, MistakeState } from './mistake-tracker.js';
export { buildMessagesPayload } from './payload.js'; export { buildMessagesPayload } from './payload.js';
export { generateToolUseSummary } from './tool-summaries.js'; export { generateToolUseSummary } from './tool-summaries.js';
export type { ToolInfo } from './tool-summaries.js'; export type { ToolInfo } from './tool-summaries.js';

View File

@@ -0,0 +1,69 @@
// v#12 MistakeTracker: heterogeneous-failure recovery. Complements the
// doom-loop guard (sentinels.ts:detectDoomLoop, which only catches *identical*
// repeats) by catching a run of consecutive tool FAILURES the model isn't
// recovering from — even when each failure is a *different* error. Algorithm
// reimplemented from cline's mistake-counting pattern (NOT vendored).
//
// Pure module — mirrors sentinels.ts:detectDoomLoop. No DB, no I/O. The state
// lives loop-local in TurnArgs (reset per runInference, like recentToolCalls).
// The failure taxonomy already distinguished in tool-phase.ts:executeToolCall.
// 'api_error' is reserved for upstream-model failures surfaced as tool outcomes
// (no current emit site on apps/server, but the union mirrors the design doc
// so a future caller can record it without a type change).
export type FailureKind =
| 'zod_reject'
| 'tool_not_found'
| 'exec_error'
| 'api_error'
| 'permission_denied';
// Smallest streak that doesn't false-positive on a model that retries once
// after a transient error. Matches DOOM_LOOP_THRESHOLD's rationale.
export const MISTAKE_THRESHOLD = 3;
export interface MistakeState {
// The current consecutive-failure streak (any successful tool step clears it).
run: FailureKind[];
// How many recovery nudges have fired without an intervening success. Used to
// escalate (stop the turn) on the second trip rather than nudging forever.
nudges: number;
}
export function freshMistakeState(): MistakeState {
return { run: [], nudges: 0 };
}
// Record one tool step's outcome. A 'success' clears BOTH the streak and the
// nudge counter (the model recovered). A FailureKind pushes onto the streak.
export function recordStep(
state: MistakeState,
outcome: FailureKind | 'success',
): void {
if (outcome === 'success') {
state.run = [];
state.nudges = 0;
return;
}
state.run.push(outcome);
}
// Decide whether to intervene given the current streak. When the streak has
// reached MISTAKE_THRESHOLD: 'nudge' the first time (no nudge fired yet),
// 'escalate' if it trips again while a nudge is already outstanding (no
// intervening success cleared `nudges`). Below threshold → null.
//
// Pure — the caller is responsible for mutating `nudges`/`run` after acting on
// the decision (mirrors how turn.ts consumes detectDoomLoop's result).
export function detectMistakePattern(
state: MistakeState,
): 'nudge' | 'escalate' | null {
if (state.run.length < MISTAKE_THRESHOLD) return null;
return state.nudges === 0 ? 'nudge' : 'escalate';
}
// Model-facing guidance injected (transiently, for the next step only) when a
// nudge fires. Short + declarative for the same reliability reason as the
// cap-hit / doom-loop notes.
export const MISTAKE_RECOVERY_NOTE =
"You've hit several different errors in a row. Stop retrying variations — re-read the tool schemas, verify file paths and arguments exist before calling, and try a fundamentally different approach.";

View File

@@ -86,7 +86,7 @@ export async function runCapHitSummary(
ctx, ctx,
session.model, session.model,
messages, messages,
{ tools: null, temperature: agent?.temperature, top_p: agent?.top_p ?? undefined, top_k: agent?.top_k ?? undefined, min_p: agent?.min_p ?? undefined, presence_penalty: agent?.presence_penalty ?? undefined }, { tools: null, temperature: agent?.temperature, top_p: agent?.top_p ?? undefined, top_k: agent?.top_k ?? undefined, min_p: agent?.min_p ?? undefined, presence_penalty: agent?.presence_penalty ?? undefined, top_n_sigma: agent?.top_n_sigma ?? undefined, dry_multiplier: agent?.dry_multiplier ?? undefined, dry_base: agent?.dry_base ?? undefined, dry_allowed_length: agent?.dry_allowed_length ?? undefined, dry_penalty_last_n: agent?.dry_penalty_last_n ?? undefined },
(delta) => { (delta) => {
accumulated += delta; accumulated += delta;
ctx.publish(sessionId, { ctx.publish(sessionId, {
@@ -346,7 +346,7 @@ export async function runDoomLoopSummary(
ctx, ctx,
session.model, session.model,
messages, messages,
{ tools: null, temperature: agent?.temperature, top_p: agent?.top_p ?? undefined, top_k: agent?.top_k ?? undefined, min_p: agent?.min_p ?? undefined, presence_penalty: agent?.presence_penalty ?? undefined }, { tools: null, temperature: agent?.temperature, top_p: agent?.top_p ?? undefined, top_k: agent?.top_k ?? undefined, min_p: agent?.min_p ?? undefined, presence_penalty: agent?.presence_penalty ?? undefined, top_n_sigma: agent?.top_n_sigma ?? undefined, dry_multiplier: agent?.dry_multiplier ?? undefined, dry_base: agent?.dry_base ?? undefined, dry_allowed_length: agent?.dry_allowed_length ?? undefined, dry_penalty_last_n: agent?.dry_penalty_last_n ?? undefined },
(delta) => { (delta) => {
accumulated += delta; accumulated += delta;
ctx.publish(sessionId, { ctx.publish(sessionId, {
@@ -545,7 +545,7 @@ export async function runStepCapSummary(
ctx, ctx,
session.model, session.model,
messages, messages,
{ tools: null, temperature: agent?.temperature, top_p: agent?.top_p ?? undefined, top_k: agent?.top_k ?? undefined, min_p: agent?.min_p ?? undefined, presence_penalty: agent?.presence_penalty ?? undefined }, { tools: null, temperature: agent?.temperature, top_p: agent?.top_p ?? undefined, top_k: agent?.top_k ?? undefined, min_p: agent?.min_p ?? undefined, presence_penalty: agent?.presence_penalty ?? undefined, top_n_sigma: agent?.top_n_sigma ?? undefined, dry_multiplier: agent?.dry_multiplier ?? undefined, dry_base: agent?.dry_base ?? undefined, dry_allowed_length: agent?.dry_allowed_length ?? undefined, dry_penalty_last_n: agent?.dry_penalty_last_n ?? undefined },
(delta) => { (delta) => {
accumulated += delta; accumulated += delta;
ctx.publish(sessionId, { ctx.publish(sessionId, {
@@ -717,3 +717,57 @@ async function insertDoomLoopSentinel(
metadata, metadata,
}); });
} }
// #12 MistakeTracker: heterogeneous-failure recovery sentinel. Mirrors
// insertDoomLoopSentinel structurally — a role='system', status='complete' row
// firing the standard message_started → delta → message_complete frame
// sequence. Two variants distinguished by `escalated`:
// - escalated:false → a nudge fired; recovery guidance was injected into the
// model's next step and the loop continued. can_continue is true (the turn
// is still live).
// - escalated:true → the nudge didn't break the failure run; the turn was
// stopped (cap-hit-style). can_continue is true so the UI can still offer a
// Continue affordance — a fresh user turn resets the tracker.
export async function insertMistakeRecoverySentinel(
ctx: InferenceContext,
sessionId: string,
chatId: string,
opts: { failureKinds: string[]; count: number; escalated: boolean; canContinue: boolean },
): Promise<void> {
const metadata: MessageMetadata = {
kind: 'mistake_recovery',
failure_kinds: opts.failureKinds,
count: opts.count,
escalated: opts.escalated,
can_continue: opts.canContinue,
};
const content = opts.escalated
? `Repeated different errors persisted after a recovery nudge (${opts.count} in a row). Stopping the tool-call loop.`
: `Hit ${opts.count} different errors in a row. Injected recovery guidance and continuing.`;
const [row] = await ctx.sql<{ id: string }[]>`
INSERT INTO messages (session_id, chat_id, role, content, status, created_at, metadata)
VALUES (${sessionId}, ${chatId}, 'system', ${content}, 'complete', clock_timestamp(), ${ctx.sql.json(metadata as never)})
RETURNING id
`;
// Standard frame sequence — same as cap-hit / doom-loop sentinels.
ctx.publish(sessionId, {
type: 'message_started',
message_id: row!.id,
chat_id: chatId,
role: 'system',
});
ctx.publish(sessionId, {
type: 'delta',
message_id: row!.id,
chat_id: chatId,
content,
});
ctx.publish(sessionId, {
type: 'message_complete',
message_id: row!.id,
chat_id: chatId,
metadata,
});
}

View File

@@ -48,6 +48,18 @@ export function isDoomLoopSentinel(m: Message): boolean {
); );
} }
export function isAnySentinel(m: Message): boolean { // #12: mistake-recovery sentinel. Same UI-only semantics as cap-hit /
return isCapHitSentinel(m) || isDoomLoopSentinel(m); // doom-loop — never sent to the LLM (filtered via the isAnySentinel check
// below, which buildMessagesPayload + buildHeadPayload both consult).
export function isMistakeRecoverySentinel(m: Message): boolean {
return (
m.role === 'system' &&
m.metadata !== null &&
typeof m.metadata === 'object' &&
(m.metadata as { kind?: unknown }).kind === 'mistake_recovery'
);
}
export function isAnySentinel(m: Message): boolean {
return isCapHitSentinel(m) || isDoomLoopSentinel(m) || isMistakeRecoverySentinel(m);
} }

View File

@@ -33,6 +33,39 @@ interface StreamOptions {
top_k?: number | null; top_k?: number | null;
min_p?: number | null; min_p?: number | null;
presence_penalty?: number | null; presence_penalty?: number | null;
// v2.6 sampling-streamjson-tokens (#11): llama.cpp sampler extensions. These
// are NOT standard AI-SDK streamText options and are NOT serialized by the
// openai-compatible provider's standardized-settings path (topK is even
// explicitly dropped with an "unsupported feature: topK" warning). They reach
// llama-server only via providerOptions.openaiCompatible (see buildSamplerProviderOptions).
top_n_sigma?: number | null;
dry_multiplier?: number | null;
dry_base?: number | null;
dry_allowed_length?: number | null;
dry_penalty_last_n?: number | null;
}
// v2.6 #11: build the providerOptions.openaiCompatible extraBody object for the
// llama.cpp sampler extensions. @ai-sdk/openai-compatible (2.0.47) merges every
// non-reserved key under providerOptions.openaiCompatible straight into the
// chat-completion request body (see its getArgs: the Object.fromEntries spread
// filtered against openaiCompatibleLanguageModelChatOptions.shape). This is the
// ONLY working passthrough for these params:
// - top_k / min_p were latently dropped before this: top_k was passed as the
// AI-SDK `topK` setting which the openai-compatible provider rejects as
// unsupported; min_p was never passed to streamText at all.
// - top_n_sigma + the dry_* family have no AI-SDK equivalent.
// Keys use llama-server's snake_case body names so they land verbatim.
function buildSamplerProviderOptions(opts: StreamOptions): Record<string, number> | undefined {
const body: Record<string, number> = {};
if (typeof opts.top_k === 'number') body.top_k = opts.top_k;
if (typeof opts.min_p === 'number') body.min_p = opts.min_p;
if (typeof opts.top_n_sigma === 'number') body.top_n_sigma = opts.top_n_sigma;
if (typeof opts.dry_multiplier === 'number') body.dry_multiplier = opts.dry_multiplier;
if (typeof opts.dry_base === 'number') body.dry_base = opts.dry_base;
if (typeof opts.dry_allowed_length === 'number') body.dry_allowed_length = opts.dry_allowed_length;
if (typeof opts.dry_penalty_last_n === 'number') body.dry_penalty_last_n = opts.dry_penalty_last_n;
return Object.keys(body).length > 0 ? body : undefined;
} }
// v1.13.1-A: convert BooCode's OpenAI-shaped history into AI SDK // v1.13.1-A: convert BooCode's OpenAI-shaped history into AI SDK
@@ -195,6 +228,14 @@ export async function streamCompletion(
return toolCall; return toolCall;
}; };
// v2.6 #11: llama.cpp sampler extensions (top_k, min_p, top_n_sigma, dry_*)
// ride providerOptions.openaiCompatible — they are NOT standardized streamText
// settings. NB: top_k used to be passed below as the AI-SDK `topK` setting;
// the openai-compatible provider dropped it with an "unsupported feature: topK"
// warning and min_p was never wired at all, so both were dead on the wire
// before this. They now go through the same extraBody path as the new params.
const samplerBody = buildSamplerProviderOptions(opts);
const result = streamText({ const result = streamText({
model: upstreamModel(ctx.config, model, agent ?? null), model: upstreamModel(ctx.config, model, agent ?? null),
messages: aiMessages, messages: aiMessages,
@@ -203,8 +244,8 @@ export async function streamCompletion(
: {}), : {}),
...(typeof opts.temperature === 'number' ? { temperature: opts.temperature } : {}), ...(typeof opts.temperature === 'number' ? { temperature: opts.temperature } : {}),
...(typeof opts.top_p === 'number' ? { topP: opts.top_p } : {}), ...(typeof opts.top_p === 'number' ? { topP: opts.top_p } : {}),
...(typeof opts.top_k === 'number' ? { topK: opts.top_k } : {}),
...(typeof opts.presence_penalty === 'number' ? { presencePenalty: opts.presence_penalty } : {}), ...(typeof opts.presence_penalty === 'number' ? { presencePenalty: opts.presence_penalty } : {}),
...(samplerBody ? { providerOptions: { openaiCompatible: samplerBody } } : {}),
abortSignal: signal, abortSignal: signal,
}); });
@@ -398,6 +439,12 @@ export async function executeStreamPhase(
const effectiveTopK = agent?.top_k ?? undefined; const effectiveTopK = agent?.top_k ?? undefined;
const effectiveMinP = agent?.min_p ?? undefined; const effectiveMinP = agent?.min_p ?? undefined;
const effectivePresencePenalty = agent?.presence_penalty ?? undefined; const effectivePresencePenalty = agent?.presence_penalty ?? undefined;
// v2.6 #11: llama.cpp sampler extensions, threaded the same way as top_k/min_p.
const effectiveTopNSigma = agent?.top_n_sigma ?? undefined;
const effectiveDryMultiplier = agent?.dry_multiplier ?? undefined;
const effectiveDryBase = agent?.dry_base ?? undefined;
const effectiveDryAllowedLength = agent?.dry_allowed_length ?? undefined;
const effectiveDryPenaltyLastN = agent?.dry_penalty_last_n ?? undefined;
// v1.12.2: ctx_max lookup is cached after the first hit per model, so this // v1.12.2: ctx_max lookup is cached after the first hit per model, so this
// is a Map probe in steady state. We capture nCtx once at the top of the // is a Map probe in steady state. We capture nCtx once at the top of the
@@ -435,7 +482,19 @@ export async function executeStreamPhase(
ctx, ctx,
session.model, session.model,
messages, messages,
{ tools: effectiveTools, temperature: effectiveTemperature, top_p: effectiveTopP, top_k: effectiveTopK, min_p: effectiveMinP, presence_penalty: effectivePresencePenalty }, {
tools: effectiveTools,
temperature: effectiveTemperature,
top_p: effectiveTopP,
top_k: effectiveTopK,
min_p: effectiveMinP,
presence_penalty: effectivePresencePenalty,
top_n_sigma: effectiveTopNSigma,
dry_multiplier: effectiveDryMultiplier,
dry_base: effectiveDryBase,
dry_allowed_length: effectiveDryAllowedLength,
dry_penalty_last_n: effectiveDryPenaltyLastN,
},
(delta) => { (delta) => {
state.accumulated += delta; state.accumulated += delta;
ctx.publish(sessionId, { ctx.publish(sessionId, {

View File

@@ -17,6 +17,7 @@ import { formatUnknownToolError } from './tool-suggestions.js';
// prompted about paths we couldn't grant anyway (e.g. /etc/passwd). // prompted about paths we couldn't grant anyway (e.g. /etc/passwd).
import { resolveGrantRoot } from '../grant_resolver.js'; import { resolveGrantRoot } from '../grant_resolver.js';
import { stripToolMarkup } from './tool-call-parser.js'; import { stripToolMarkup } from './tool-call-parser.js';
import type { FailureKind } from './mistake-tracker.js';
import type { import type {
InferenceContext, InferenceContext,
StreamResult, StreamResult,
@@ -33,13 +34,18 @@ async function executeToolCall(
toolCall: ToolCall, toolCall: ToolCall,
extraRoots: readonly string[], extraRoots: readonly string[],
toolCtx?: ToolExecCtx, toolCtx?: ToolExecCtx,
): Promise<{ output: unknown; truncated: boolean; error?: string }> { ): Promise<{ output: unknown; truncated: boolean; error?: string; outcome: FailureKind | 'success' }> {
// v#12 MistakeTracker: every return path carries an `outcome` so the turn
// loop can detect a run of heterogeneous failures. The failure taxonomy
// mirrors mistake-tracker.ts:FailureKind. Does NOT alter the existing
// output/truncated/error shape — outcome is purely additive.
const tool = TOOLS_BY_NAME[toolCall.name]; const tool = TOOLS_BY_NAME[toolCall.name];
if (!tool) { if (!tool) {
return { return {
output: null, output: null,
truncated: false, truncated: false,
error: formatUnknownToolError(toolCall.name, Object.keys(TOOLS_BY_NAME)), error: formatUnknownToolError(toolCall.name, Object.keys(TOOLS_BY_NAME)),
outcome: 'tool_not_found',
}; };
} }
const parsed = tool.inputSchema.safeParse(toolCall.args); const parsed = tool.inputSchema.safeParse(toolCall.args);
@@ -64,6 +70,7 @@ async function executeToolCall(
output: null, output: null,
truncated: false, truncated: false,
error: `tool '${toolCall.name}' rejected — ${hint}`, error: `tool '${toolCall.name}' rejected — ${hint}`,
outcome: 'zod_reject',
}; };
} }
try { try {
@@ -72,15 +79,16 @@ async function executeToolCall(
typeof output === 'object' && output !== null && 'truncated' in output typeof output === 'object' && output !== null && 'truncated' in output
? Boolean((output as { truncated: unknown }).truncated) ? Boolean((output as { truncated: unknown }).truncated)
: false; : false;
return { output, truncated }; return { output, truncated, outcome: 'success' };
} catch (err) { } catch (err) {
if (err instanceof PathScopeError) { if (err instanceof PathScopeError) {
return { output: null, truncated: false, error: err.message }; return { output: null, truncated: false, error: err.message, outcome: 'permission_denied' };
} }
return { return {
output: null, output: null,
truncated: false, truncated: false,
error: err instanceof Error ? err.message : String(err), error: err instanceof Error ? err.message : String(err),
outcome: 'exec_error',
}; };
} }
} }
@@ -93,6 +101,12 @@ export interface ToolPhaseResult {
toolCallCount: number; toolCallCount: number;
toolCalls: ToolCall[]; toolCalls: ToolCall[];
nextAssistantId: string | null; nextAssistantId: string | null;
// v#12 MistakeTracker: one outcome per executed tool call, in no particular
// order (filled inside the Promise.all callbacks). The turn loop folds these
// into TurnArgs.mistakeTracker via recordStep. Pause/auto-grant control-flow
// tools record 'success' (they aren't model mistakes); the genuine error
// paths record their FailureKind.
outcomes: (FailureKind | 'success')[];
} }
export async function executeToolPhase( export async function executeToolPhase(
@@ -187,6 +201,10 @@ export async function executeToolPhase(
// for the synthesis input. Race-free under Promise.all because each // for the synthesis input. Race-free under Promise.all because each
// callback pushes its own captured value. // callback pushes its own captured value.
const synthEntries: Array<{ tc: ToolCall; output: unknown; error?: string }> = []; const synthEntries: Array<{ tc: ToolCall; output: unknown; error?: string }> = [];
// v#12 MistakeTracker: collect each tool's outcome. Concurrent pushes under
// Promise.all are safe (each callback appends its own value; order is not
// significant to recordStep which folds them sequentially).
const outcomes: (FailureKind | 'success')[] = [];
await Promise.all( await Promise.all(
toolCalls.map(async (tc) => { toolCalls.map(async (tc) => {
const [toolRow] = await ctx.sql<{ id: string }[]>` const [toolRow] = await ctx.sql<{ id: string }[]>`
@@ -197,6 +215,7 @@ export async function executeToolPhase(
const toolMessageId = toolRow!.id; const toolMessageId = toolRow!.id;
if (tc.name === 'ask_user_input') { if (tc.name === 'ask_user_input') {
pausingForUserInput = true; pausingForUserInput = true;
outcomes.push('success');
const sentinel = { tool_call_id: tc.id, output: null, truncated: false }; const sentinel = { tool_call_id: tc.id, output: null, truncated: false };
// v1.13.20: parts-only. The answer-endpoint UPDATE later // v1.13.20: parts-only. The answer-endpoint UPDATE later
// (messages.ts) will delete and re-insert this part when the user // (messages.ts) will delete and re-insert this part when the user
@@ -227,7 +246,10 @@ export async function executeToolPhase(
); );
if (!resolution.ok) { if (!resolution.ok) {
// Auto-deny without pausing. The model sees the reason on its // Auto-deny without pausing. The model sees the reason on its
// next turn and decides what to do. // next turn and decides what to do. Counts as a permission_denied
// failure for the mistake tracker (the model asked for a path it
// can't have — a recoverable mistake it should learn from).
outcomes.push('permission_denied');
const stored = { const stored = {
tool_call_id: tc.id, tool_call_id: tc.id,
output: `denied: ${resolution.reason}`, output: `denied: ${resolution.reason}`,
@@ -255,6 +277,7 @@ export async function executeToolPhase(
// pause. The grant endpoint re-derives the root at decision time // pause. The grant endpoint re-derives the root at decision time
// (state may have changed in the meantime) so we don't stash it here. // (state may have changed in the meantime) so we don't stash it here.
pausingForUserInput = true; pausingForUserInput = true;
outcomes.push('success');
const sentinel = { tool_call_id: tc.id, output: null, truncated: false }; const sentinel = { tool_call_id: tc.id, output: null, truncated: false };
// v1.13.20: parts-only write. // v1.13.20: parts-only write.
await insertParts( await insertParts(
@@ -267,6 +290,10 @@ export async function executeToolPhase(
return; return;
} }
if (agent && !matchToolGlob(tc.name, agent.tools)) { if (agent && !matchToolGlob(tc.name, agent.tools)) {
// Agent-scope denial — the model called a tool outside its whitelist.
// permission_denied for the mistake tracker (the model should pick a
// tool it's actually allowed to use).
outcomes.push('permission_denied');
const stored = { const stored = {
tool_call_id: tc.id, tool_call_id: tc.id,
output: null, output: null,
@@ -295,6 +322,10 @@ export async function executeToolPhase(
sql: ctx.sql, sql: ctx.sql,
sessionId, sessionId,
}); });
// v#12 MistakeTracker: record the real execution outcome (success or a
// FailureKind). This is the primary signal for heterogeneous-failure
// detection.
outcomes.push(tres.outcome);
if (SYNTHESIS_TOOLS.has(tc.name)) { if (SYNTHESIS_TOOLS.has(tc.name)) {
synthEntries.push({ tc, output: tres.output, ...(tres.error ? { error: tres.error } : {}) }); synthEntries.push({ tc, output: tres.output, ...(tres.error ? { error: tres.error } : {}) });
} }
@@ -340,6 +371,7 @@ export async function executeToolPhase(
toolCallCount: toolCalls.length, toolCallCount: toolCalls.length,
toolCalls, toolCalls,
nextAssistantId: null, nextAssistantId: null,
outcomes,
}; };
} }
@@ -378,6 +410,7 @@ export async function executeToolPhase(
toolCallCount: toolCalls.length, toolCallCount: toolCalls.length,
toolCalls, toolCalls,
nextAssistantId: null, nextAssistantId: null,
outcomes,
}; };
} }
// ran === false → synthesis failed (timeout / model error) → fall through // ran === false → synthesis failed (timeout / model error) → fall through
@@ -397,5 +430,6 @@ export async function executeToolPhase(
toolCallCount: toolCalls.length, toolCallCount: toolCalls.length,
toolCalls, toolCalls,
nextAssistantId: nextAssistant!.id, nextAssistantId: nextAssistant!.id,
outcomes,
}; };
} }

View File

@@ -22,6 +22,13 @@ import { resolveToolBudget } from './budget.js';
import { import {
detectDoomLoop, detectDoomLoop,
} from './sentinels.js'; } from './sentinels.js';
import {
detectMistakePattern,
freshMistakeState,
recordStep,
MISTAKE_RECOVERY_NOTE,
type MistakeState,
} from './mistake-tracker.js';
import { import {
buildMessagesPayload, buildMessagesPayload,
loadContext, loadContext,
@@ -39,6 +46,7 @@ import {
runCapHitSummary, runCapHitSummary,
runDoomLoopSummary, runDoomLoopSummary,
runStepCapSummary, runStepCapSummary,
insertMistakeRecoverySentinel,
} from './sentinel-summaries.js'; } from './sentinel-summaries.js';
// v1.14.0: hard ceiling on the number of stream-and-tool iterations per // v1.14.0: hard ceiling on the number of stream-and-tool iterations per
@@ -144,6 +152,16 @@ export interface TurnArgs {
// boundaries by runInference, same as toolsUsed. Doom-loop check at the // boundaries by runInference, same as toolsUsed. Doom-loop check at the
// top of runAssistantTurn slices the last DOOM_LOOP_THRESHOLD entries. // top of runAssistantTurn slices the last DOOM_LOOP_THRESHOLD entries.
recentToolCalls: ToolCall[]; recentToolCalls: ToolCall[];
// v#12 MistakeTracker: heterogeneous-failure recovery state. Loop-local,
// reset per runInference (user-message boundary) like recentToolCalls. Folds
// tool-phase outcomes via recordStep each iteration; detectMistakePattern
// gates the nudge/escalate decision.
mistakeTracker: MistakeState;
// v#12: transient model-facing recovery note set when a nudge fires. Consumed
// (appended as a role:'system' message + cleared) on the NEXT payload build.
// Never persisted — mirrors how the cap-hit/doom-loop notes live only inside
// the summary call's messages array.
pendingRecoveryNote?: string;
signal: AbortSignal | undefined; signal: AbortSignal | undefined;
} }
@@ -188,6 +206,12 @@ export async function runAssistantTurn(
let toolsUsed = args.toolsUsed; let toolsUsed = args.toolsUsed;
let recentToolCalls = args.recentToolCalls; let recentToolCalls = args.recentToolCalls;
let assistantMessageId = args.assistantMessageId; let assistantMessageId = args.assistantMessageId;
// v#12 MistakeTracker: the tracker state is carried on `args` (mutated in
// place by recordStep). pendingRecoveryNote is a loop-local because it is a
// single-step transient — set when a nudge fires, consumed (injected into the
// next payload) and cleared on the following iteration.
const mistakeTracker = args.mistakeTracker;
let pendingRecoveryNote: string | undefined = args.pendingRecoveryNote;
while (stepNumber < effectiveCap) { while (stepNumber < effectiveCap) {
// ---- doom-loop check (moved from top-of-function) ---- // ---- doom-loop check (moved from top-of-function) ----
@@ -196,7 +220,7 @@ export async function runAssistantTurn(
// Need fresh history for the summary. // Need fresh history for the summary.
const loaded = await loadContext(ctx.sql, sessionId, chatId); const loaded = await loadContext(ctx.sql, sessionId, chatId);
if (loaded) { if (loaded) {
const iterArgs: TurnArgs = { sessionId, chatId, assistantMessageId, toolsUsed, recentToolCalls, signal }; const iterArgs: TurnArgs = { sessionId, chatId, assistantMessageId, toolsUsed, recentToolCalls, mistakeTracker, signal };
await runDoomLoopSummary(ctx, iterArgs, loaded.session, loaded.project, loaded.history, agent, loop); await runDoomLoopSummary(ctx, iterArgs, loaded.session, loaded.project, loaded.history, agent, loop);
} }
break; break;
@@ -206,7 +230,7 @@ export async function runAssistantTurn(
if (toolsUsed >= budget) { if (toolsUsed >= budget) {
const loaded = await loadContext(ctx.sql, sessionId, chatId); const loaded = await loadContext(ctx.sql, sessionId, chatId);
if (loaded) { if (loaded) {
const iterArgs: TurnArgs = { sessionId, chatId, assistantMessageId, toolsUsed, recentToolCalls, signal }; const iterArgs: TurnArgs = { sessionId, chatId, assistantMessageId, toolsUsed, recentToolCalls, mistakeTracker, signal };
await runCapHitSummary(ctx, iterArgs, loaded.session, loaded.project, loaded.history, agent, budget); await runCapHitSummary(ctx, iterArgs, loaded.session, loaded.project, loaded.history, agent, budget);
} }
break; break;
@@ -265,7 +289,16 @@ export async function runAssistantTurn(
} }
} }
const iterArgs: TurnArgs = { sessionId, chatId, assistantMessageId, toolsUsed, recentToolCalls, signal }; // v#12 MistakeTracker: if the prior iteration's nudge fired, append the
// transient recovery note to THIS payload (consumed exactly once, then
// cleared). Never persisted — same lifecycle as the cap-hit/doom-loop
// summary notes, which live only inside the in-memory messages array.
if (pendingRecoveryNote) {
messages.push({ role: 'system', content: pendingRecoveryNote });
pendingRecoveryNote = undefined;
}
const iterArgs: TurnArgs = { sessionId, chatId, assistantMessageId, toolsUsed, recentToolCalls, mistakeTracker, signal };
const state: StreamPhaseState = { accumulated: '', startedAt: null }; const state: StreamPhaseState = { accumulated: '', startedAt: null };
let result: StreamResult; let result: StreamResult;
try { try {
@@ -305,10 +338,78 @@ export async function runAssistantTurn(
recentToolCalls = [...recentToolCalls, ...toolPhaseResult.toolCalls]; recentToolCalls = [...recentToolCalls, ...toolPhaseResult.toolCalls];
stepNumber++; stepNumber++;
// v#12 MistakeTracker: fold this iteration's tool outcomes into the
// tracker, in order. recordStep mutates `mistakeTracker` in place (it is
// the same object referenced by args). A 'success' clears the streak.
for (const o of toolPhaseResult.outcomes) {
recordStep(mistakeTracker, o);
}
if (toolPhaseResult.action !== 'continue') { if (toolPhaseResult.action !== 'continue') {
// 'paused' (user input) or 'synthesis_done' — stop the loop. // 'paused' (user input) or 'synthesis_done' — stop the loop. The turn is
// already ending, so neither a nudge nor an escalate would change the
// control flow; we skip the mistake decision here.
break; break;
} }
// v#12 MistakeTracker: heterogeneous-failure decision. Only evaluated on
// the 'continue' path (the only case where the loop would otherwise
// proceed to another step). Complements the doom-loop check above, which
// only catches *identical* repeats.
const mistake = detectMistakePattern(mistakeTracker);
if (mistake === 'nudge') {
// Soft intervention: inject model-facing recovery guidance into the NEXT
// step's payload, drop a UI sentinel, bump nudges, reset the streak, and
// continue. The note is consumed (and cleared) at the top of the next
// iteration's payload build.
pendingRecoveryNote = MISTAKE_RECOVERY_NOTE;
const failureKinds = [...mistakeTracker.run];
await insertMistakeRecoverySentinel(ctx, sessionId, chatId, {
failureKinds,
count: failureKinds.length,
escalated: false,
canContinue: true,
});
mistakeTracker.nudges += 1;
mistakeTracker.run = [];
ctx.log.info(
{ sessionId, chatId, step: stepNumber, nudges: mistakeTracker.nudges, failureKinds },
'mistake_recovery nudge',
);
assistantMessageId = toolPhaseResult.nextAssistantId!;
continue;
}
if (mistake === 'escalate') {
// The nudge didn't break the failure run — stop the turn (cap-hit-style)
// to avoid burning the whole step budget on heterogeneous failures. The
// next assistant row is still 'streaming'; finalize it as a short note so
// the slot doesn't dangle, then drop the escalate sentinel.
const failureKinds = [...mistakeTracker.run];
assistantMessageId = toolPhaseResult.nextAssistantId!;
await ctx.sql`
UPDATE messages
SET content = '', status = 'complete', finished_at = clock_timestamp()
WHERE id = ${assistantMessageId}
`;
ctx.publish(sessionId, {
type: 'message_complete',
message_id: assistantMessageId,
chat_id: chatId,
});
await insertMistakeRecoverySentinel(ctx, sessionId, chatId, {
failureKinds,
count: failureKinds.length,
escalated: true,
canContinue: true,
});
ctx.publishUser({ type: 'chat_status', chat_id: chatId, status: 'idle', at: new Date().toISOString() });
ctx.log.info(
{ sessionId, chatId, step: stepNumber, failureKinds },
'mistake_recovery escalate — stopping turn',
);
break;
}
// 'continue' — advance to next assistant message. // 'continue' — advance to next assistant message.
assistantMessageId = toolPhaseResult.nextAssistantId!; assistantMessageId = toolPhaseResult.nextAssistantId!;
} }
@@ -320,7 +421,7 @@ export async function runAssistantTurn(
if (stepNumber >= effectiveCap && effectiveCap < Infinity) { if (stepNumber >= effectiveCap && effectiveCap < Infinity) {
const loaded = await loadContext(ctx.sql, sessionId, chatId); const loaded = await loadContext(ctx.sql, sessionId, chatId);
if (loaded) { if (loaded) {
const capArgs: TurnArgs = { sessionId, chatId, assistantMessageId, toolsUsed, recentToolCalls, signal }; const capArgs: TurnArgs = { sessionId, chatId, assistantMessageId, toolsUsed, recentToolCalls, mistakeTracker, signal };
await runStepCapSummary(ctx, capArgs, loaded.session, loaded.project, loaded.history, agent, stepNumber, effectiveCap); await runStepCapSummary(ctx, capArgs, loaded.session, loaded.project, loaded.history, agent, stepNumber, effectiveCap);
} }
} }
@@ -378,12 +479,16 @@ export async function runInference(
// per-call budget. // per-call budget.
// v1.11.6: recentToolCalls also resets — doom-loop detection is scoped // v1.11.6: recentToolCalls also resets — doom-loop detection is scoped
// to a single user-message turn, so a Continue starts with no history. // to a single user-message turn, so a Continue starts with no history.
// v#12 MistakeTracker: fresh per user-message turn, like recentToolCalls.
// Tracks consecutive heterogeneous tool failures across the loop's
// stream-and-tool iterations within this turn.
return runAssistantTurn(ctx, { return runAssistantTurn(ctx, {
sessionId, sessionId,
chatId, chatId,
assistantMessageId, assistantMessageId,
toolsUsed: 0, toolsUsed: 0,
recentToolCalls: [], recentToolCalls: [],
mistakeTracker: freshMistakeState(),
signal, signal,
}); });
} }

View File

@@ -117,6 +117,15 @@ export interface Agent {
top_k: number | null; // null means omit from request body top_k: number | null; // null means omit from request body
min_p: number | null; // null means omit from request body min_p: number | null; // null means omit from request body
presence_penalty: number | null; // null means omit from request body presence_penalty: number | null; // null means omit from request body
// v2.6 sampling-streamjson-tokens (#11): llama.cpp sampler extensions.
// null = omit from request body. top_n_sigma + the DRY repetition family
// help the doom-loop-prone local model. All travel via the same
// providerOptions.openaiCompatible extraBody channel as top_k/min_p.
top_n_sigma: number | null;
dry_multiplier: number | null;
dry_base: number | null;
dry_allowed_length: number | null;
dry_penalty_last_n: number | null;
tools: string[]; // whitelist of tool names; empty = no tools allowed tools: string[]; // whitelist of tool names; empty = no tools allowed
model: string | null; // null means "session.model wins" model: string | null; // null means "session.model wins"
source: AgentSource; source: AgentSource;
@@ -198,10 +207,15 @@ export type ErrorReason =
| 'summary_after_cap_failed'; | 'summary_after_cap_failed';
// v1.8.2 / v1.11.6: shapes stored in messages.metadata. Discriminated on `kind`. // v1.8.2 / v1.11.6: shapes stored in messages.metadata. Discriminated on `kind`.
// cap_hit — system sentinel emitted when tool budget is exhausted // cap_hit — system sentinel emitted when tool budget is exhausted
// doom_loop — system sentinel emitted when the model called the same // doom_loop — system sentinel emitted when the model called the same
// tool with the same args DOOM_LOOP_THRESHOLD times in a row // tool with the same args DOOM_LOOP_THRESHOLD times in a row
// error — attached to a failed assistant message so UI can show reason // mistake_recovery — system sentinel emitted when a run of consecutive
// *heterogeneous* tool failures is detected (#12). A nudge
// (escalated:false) injects model-facing recovery guidance
// and continues; an escalate (escalated:true) stops the
// turn after the nudge failed to break the failure run.
// error — attached to a failed assistant message so UI can show reason
export type MessageMetadata = export type MessageMetadata =
| { | {
kind: 'cap_hit'; kind: 'cap_hit';
@@ -216,6 +230,14 @@ export type MessageMetadata =
args: Record<string, unknown>; args: Record<string, unknown>;
threshold: number; threshold: number;
} }
| {
// PINNED CONTRACT (#12) — mirrored byte-for-byte in apps/web/src/api/types.ts.
kind: 'mistake_recovery';
failure_kinds: string[];
count: number;
escalated: boolean;
can_continue?: boolean;
}
| { | {
kind: 'error'; kind: 'error';
error_reason: ErrorReason; error_reason: ErrorReason;

View File

@@ -34,6 +34,12 @@ export interface AgentSessionInfo {
status: string; status: string;
has_session: boolean; has_session: boolean;
last_active_at: string | null; last_active_at: string | null;
// v2.6.8 per-(chat,agent) running token/cost totals (sampling-streamjson-tokens
// #8). input_tokens/output_tokens are BIGINT and may arrive as strings; cost is
// DOUBLE. AgentComposerBar coerces with Number(...) before rendering.
input_tokens: number;
output_tokens: number;
cost: number;
} }
// write-edit-robustness #4: a pre-turn worktree snapshot anchored to an // write-edit-robustness #4: a pre-turn worktree snapshot anchored to an

View File

@@ -155,6 +155,9 @@ export type ErrorReason =
// budget + agent name + whether Continue is still allowed. // budget + agent name + whether Continue is still allowed.
// doom_loop — sentinel emitted when the model called the same tool with // doom_loop — sentinel emitted when the model called the same tool with
// the same arguments threshold times in a row. // the same arguments threshold times in a row.
// mistake_recovery — sentinel emitted when the model hit repeated *different*
// errors; non-escalated means recovery guidance was injected and
// the turn continues, escalated means the turn was stopped.
// error — attached to a failed assistant message so the bubble can show // error — attached to a failed assistant message so the bubble can show
// a specific reason on reload (WS error frame is one-shot). // a specific reason on reload (WS error frame is one-shot).
export type MessageMetadata = export type MessageMetadata =
@@ -171,6 +174,13 @@ export type MessageMetadata =
args: Record<string, unknown>; args: Record<string, unknown>;
threshold: number; threshold: number;
} }
| {
kind: 'mistake_recovery';
failure_kinds: string[];
count: number;
escalated: boolean;
can_continue?: boolean;
}
| { | {
kind: 'error'; kind: 'error';
error_reason: ErrorReason; error_reason: ErrorReason;

View File

@@ -185,6 +185,14 @@ interface Props {
hasPriorTurn?: boolean; hasPriorTurn?: boolean;
} }
// Condensed token count: 950 → "950", 12_400 → "12.4K", 3_200_000 → "3.2M".
// Sub-1000 stays exact; thousands/millions get one decimal, trailing .0 trimmed.
function abbrevTokens(n: number): string {
if (!Number.isFinite(n) || n < 1000) return String(Math.max(0, Math.round(n)));
if (n < 1_000_000) return `${(n / 1000).toFixed(1).replace(/\.0$/, '')}K`;
return `${(n / 1_000_000).toFixed(1).replace(/\.0$/, '')}M`;
}
// Relative-time formatter for the resumed-chip title (e.g. "3m ago"). // Relative-time formatter for the resumed-chip title (e.g. "3m ago").
function relativeTime(iso: string | null): string { function relativeTime(iso: string | null): string {
if (!iso) return 'unknown'; if (!iso) return 'unknown';
@@ -353,6 +361,21 @@ export function AgentComposerBar({ projectPath, value, onChange, onProviderComma
: { label: 'new session', title: `${value.provider} starts a fresh session this turn` } : { label: 'new session', title: `${value.provider} starts a fresh session this turn` }
: null; : null;
// sampling-streamjson-tokens #8: condensed per-(chat,agent) token/cost readout
// beside the session chip. Coerce — input/output are BIGINT (string over wire).
// Hidden when no session row or all totals are zero (e.g. native boocode, which
// holds no agent_sessions row, or a provider that hasn't run yet).
const usageReadout = (() => {
if (!sessionChip || !sessionRow) return null;
const inTok = Number(sessionRow.input_tokens) || 0;
const outTok = Number(sessionRow.output_tokens) || 0;
const cost = Number(sessionRow.cost) || 0;
if (inTok <= 0 && outTok <= 0 && cost <= 0) return null;
const parts = [`${abbrevTokens(inTok)} in`, `${abbrevTokens(outTok)} out`];
if (cost > 0) parts.push(`$${cost.toFixed(2)}`);
return parts.join(' · ');
})();
return ( return (
<div className="flex flex-wrap items-center gap-1 px-2 py-1 border-b border-border bg-muted/20 shrink-0"> <div className="flex flex-wrap items-center gap-1 px-2 py-1 border-b border-border bg-muted/20 shrink-0">
<CompactPicker <CompactPicker
@@ -374,6 +397,14 @@ export function AgentComposerBar({ projectPath, value, onChange, onProviderComma
{sessionChip.label} {sessionChip.label}
</span> </span>
)} )}
{usageReadout && (
<span
className="text-[10px] text-muted-foreground tabular-nums whitespace-nowrap shrink-0"
title="Tokens in · out · cost for this agent session"
>
{usageReadout}
</span>
)}
<CompactPicker <CompactPicker
label="Mode" label="Mode"
value={value.modeId ?? ''} value={value.modeId ?? ''}

View File

@@ -1,6 +1,6 @@
import { useEffect, useState } from 'react'; import { useEffect, useState } from 'react';
import type { ReactNode } from 'react'; import type { ReactNode } from 'react';
import { ChevronDown, ChevronRight, Copy, RefreshCw, Check, Share2, RotateCw, GitFork, Trash2, Brain, History } from 'lucide-react'; import { ChevronDown, ChevronRight, Copy, RefreshCw, Check, Share2, RotateCw, GitFork, Trash2, Brain, History, AlertCircle } from 'lucide-react';
import { toast } from 'sonner'; import { toast } from 'sonner';
import type { Chat, ErrorReason, Message } from '@/api/types'; import type { Chat, ErrorReason, Message } from '@/api/types';
import { api } from '@/api/client'; import { api } from '@/api/client';
@@ -637,6 +637,76 @@ function ReasoningBlock({ text, streaming }: { text: string; streaming: boolean
); );
} }
// feature #12: mistake-recovery sentinel. Inserted by the backend as a
// role='system', metadata.kind='mistake_recovery' row when the model hit
// repeated *different* errors (distinct from doom_loop, which is the same
// call repeated). Visual treatment mirrors CapHitSentinel / DoomLoopSentinel
// (amber card + alert icon). Non-escalated → recovery guidance was injected
// and the turn continues. Escalated → the turn was stopped; if can_continue
// is set, offer the same Continue affordance as the cap-hit sentinel.
// Loose `!= null` guards per the CLAUDE.md coder-message note (coder rows pass
// metadata as undefined, not null).
function MistakeRecoverySentinel({ message }: { message: Message }) {
const meta = message.metadata;
const isMistakeRecovery =
meta != null && typeof meta === 'object' && meta.kind === 'mistake_recovery';
const failureKinds = isMistakeRecovery ? meta.failure_kinds : [];
const escalated = isMistakeRecovery ? meta.escalated : false;
const canContinue = isMistakeRecovery ? meta.can_continue === true : false;
const [continuing, setContinuing] = useState(false);
async function handleContinue() {
if (continuing || !canContinue) return;
setContinuing(true);
try {
await api.chats.continue(message.chat_id, message.id);
} catch (err) {
toast.error(err instanceof Error ? err.message : 'continue failed');
} finally {
setContinuing(false);
}
}
const kindsLabel =
Array.isArray(failureKinds) && failureKinds.length > 0
? failureKinds.join(', ')
: null;
return (
<div className="rounded-md border border-amber-500/40 bg-amber-500/10 text-sm">
<div className="px-3 py-2 flex items-start gap-2">
<AlertCircle className="size-4 text-amber-500 shrink-0 mt-0.5" />
<div className="flex-1 min-w-0 space-y-1">
<div className="text-xs font-medium text-amber-700 dark:text-amber-300">
{escalated ? 'Repeated errors — turn stopped' : 'Recovering from repeated errors'}
</div>
<div className="text-xs text-muted-foreground">
{escalated
? 'Repeated errors persisted — stopped the turn.'
: kindsLabel
? `Hit repeated different errors (${kindsLabel}) — recovery guidance injected, continuing.`
: 'Hit repeated different errors — recovery guidance injected, continuing.'}
</div>
{escalated && canContinue && (
<div className="pt-1">
<Button
type="button"
size="sm"
variant="outline"
onClick={() => void handleContinue()}
disabled={continuing}
>
{continuing ? 'Continuing…' : 'Continue'}
</Button>
</div>
)}
</div>
</div>
</div>
);
}
export function MessageBubble({ export function MessageBubble({
message, message,
sessionChats, sessionChats,
@@ -681,6 +751,13 @@ export function MessageBubble({
return <DoomLoopSentinel message={message} />; return <DoomLoopSentinel message={message} />;
} }
// feature #12: mistake-recovery sentinel. Non-escalated rows narrate that
// recovery guidance was injected mid-turn; escalated rows report the turn
// was stopped and (when can_continue) offer the cap-hit-style Continue.
if (message.role === 'system' && message.metadata?.kind === 'mistake_recovery') {
return <MistakeRecoverySentinel message={message} />;
}
// v1.8.2: tool messages and assistant tool_calls are now rendered by // v1.8.2: tool messages and assistant tool_calls are now rendered by
// MessageList via ToolCallLine / ToolCallGroup. Tool-role messages reach // MessageList via ToolCallLine / ToolCallGroup. Tool-role messages reach
// this point only if MessageList didn't consume them (shouldn't happen, // this point only if MessageList didn't consume them (shouldn't happen,

View File

@@ -6,6 +6,10 @@ Operating rules for every agent in this registry. Full procedures live in the `c
**Worktrees** — Isolate work in a worktree when it is parallel to in-progress work, risky/experimental, a hotfix interrupting other work, or splits into independent units — just create when clear, propose in one line when ambiguous, skip quick/small single-stream work. Branch from a stable base (default branch); worktrees persist (never auto-remove or auto-merge); they isolate code state, not runtime (ports/DBs/services still collide). Full heuristic: invoke `using-worktrees`. **Worktrees** — Isolate work in a worktree when it is parallel to in-progress work, risky/experimental, a hotfix interrupting other work, or splits into independent units — just create when clear, propose in one line when ambiguous, skip quick/small single-stream work. Branch from a stable base (default branch); worktrees persist (never auto-remove or auto-merge); they isolate code state, not runtime (ports/DBs/services still collide). Full heuristic: invoke `using-worktrees`.
**Sampling knobs** — Each `## Name` frontmatter block accepts these per-agent sampler fields, threaded into the llama-swap chat-completion request: `temperature`, `top_p`, `top_k`, `min_p`, `presence_penalty`, and (v2.6) `top_n_sigma`, `dry_multiplier`, `dry_base`, `dry_allowed_length`, `dry_penalty_last_n`. The `top_n_sigma` + `dry_*` repetition family curb the doom-loop-prone local model. Omit a field to leave it at the server default. Example: `top_n_sigma: 1.0`, `dry_multiplier: 0.8`, `dry_base: 1.75`, `dry_allowed_length: 2`, `dry_penalty_last_n: -1` (-1 = whole context).
**Reasoning budget** — To cap a reasoning model's thinking tokens, pass `--reasoning-budget` through `llama_extra_args` (already permitted by the deny-list validator; routes the agent to llama-sidecar). Example frontmatter line: `llama_extra_args: ["--reasoning-budget", "2048"]`. This is a sidecar process flag, not a chat-completion body param — distinct from the sampling knobs above.
## Code Reviewer ## Code Reviewer
--- ---
temperature: 0.6 temperature: 0.6

View File

@@ -0,0 +1,70 @@
# MistakeTracker + file-provenance ledger (#12)
**Status:** in progress (started 2026-06-01)
**Source:** `boocode_code_review_v2.md` §1 #12, §5e (cline — algorithm-reimplemented, not vendored).
Two native-inference (apps/server) hardening features. One cohesive backend change (they share
`TurnArgs` + the tool-phase observation point) + a small frontend sentinel render.
## Part A — MistakeTracker (heterogeneous-failure recovery)
Complements the doom-loop guard (`sentinels.ts:detectDoomLoop`, which only catches *identical*
repeats) by catching a run of consecutive tool **failures** the model isn't recovering from.
- New pure `apps/server/src/services/inference/mistake-tracker.ts` (mirrors `detectDoomLoop`):
- `FailureKind = 'zod_reject' | 'tool_not_found' | 'exec_error' | 'api_error' | 'permission_denied'`
(all already distinguished in `tool-phase.ts:executeToolCall`).
- `MISTAKE_THRESHOLD = 3`.
- State `{ run: FailureKind[]; nudges: number }``run` is the current consecutive-failure streak,
reset on ANY successful tool step; `nudges` counts recovery injections not yet cleared by a success.
- `recordStep(state, outcome)` where outcome is a failure kind or `'success'`.
- `detectMistakePattern(state): 'nudge' | 'escalate' | null``run.length >= 3``'nudge'` the first
time (`nudges === 0`), `'escalate'` if it trips again while `nudges >= 1` (no intervening success).
- Lives in `TurnArgs` (loop-local, reset per `runInference`, like `recentToolCalls`).
- Integration in `turn.ts` loop: after each tool phase, `recordStep` per tool outcome; then
`detectMistakePattern`:
- `'nudge'` (decision: soft + escalate): append a transient **model-facing** recovery-guidance system
message to the NEXT turn's payload (re-read schemas, verify paths exist before acting, try a
different approach — not retry variations), insert a `mistake_recovery` UI sentinel
(`escalated:false`), bump `nudges`, reset `run`. Loop continues.
- `'escalate'`: stop the turn (break), insert a `mistake_recovery` sentinel (`escalated:true`,
`can_continue:true`, cap-hit-style), finalize. Prevents heterogeneous failures from burning the
whole step budget.
## Part B — File-provenance ledger (Read-only)
- Accumulate file paths read by `view_file`/`grep`/`find_files`/`list_dir` into `TurnArgs.filesRead:
Set<string>` (recorded at the tool-phase, like the failure outcomes).
- On compaction (`compaction.ts:buildPrompt`), inject a deterministic, sorted `## Files Read` list into
the summary prompt context so the summarizer merges it into the rolling summary — **no new
table/column**; it propagates as summary text across compactions. `compaction-prompt.ts`'s
`SUMMARY_TEMPLATE` already has a `## Relevant Files` section to extend/merge with.
- BooChat is **read-only** (no write tools on apps/server) → "Files Modified" is N/A here; only
"Files Read". (The apps/coder write side can add "Modified" later.)
## Sentinel contract (pinned — backend + frontend must match)
New sentinel kind on `MessageMetadata` in BOTH `apps/server/src/types/api.ts` AND
`apps/web/src/api/types.ts`:
```
{ kind: 'mistake_recovery'; failure_kinds: string[]; count: number; escalated: boolean; can_continue?: boolean }
```
- `role='system'`, `status='complete'`, stripped from the LLM payload via `isAnySentinel` in
`payload.ts` (UI-only) and `compaction.ts:buildHeadPayload`.
- Frontend render branch in `apps/web/src/components/MessageBubble.tsx`: `escalated:false` →
"Hit repeated different errors — recovery guidance injected, continuing." `escalated:true` →
"Repeated errors persisted — stopped the turn." (mirror the doom-loop/cap-hit branches).
## Decisions (2026-06-01)
- MistakeTracker intervention: **soft nudge + escalate**.
- **UI sentinel** for recovery (`mistake_recovery`).
## Files (backend, one agent) / (frontend, one agent)
- Backend: `mistake-tracker.ts` (new), `turn.ts`, `tool-phase.ts`, `sentinels.ts`,
`sentinel-summaries.ts`, `payload.ts`, `compaction.ts`, `compaction-prompt.ts`, `types/api.ts` +
tests (`mistake-tracker.test.ts`, ledger/compaction assertions).
- Frontend: `apps/web/src/api/types.ts` (MessageMetadata arm) + `MessageBubble.tsx` (render branch).
MUST NOT touch Sam's WIP web files.
## Verify
- `pnpm -C apps/server test`; `pnpm -C apps/server build`; `npx tsc -p apps/web/tsconfig.app.json --noEmit`

View File

@@ -0,0 +1,45 @@
# Small wins — sampling knobs + PTY stream-json + token UI
**Status:** in progress (started 2026-06-01)
**Source:** `boocode_code_review_v2.md` §1 #11 / #7 / #8 (config-adopt + qwen-code §5g + opencode §3 #4).
Three independent BooCode improvements, disjoint subsystems (apps/server / apps/coder / apps/web).
## #11 — New sampling knobs (apps/server)
Per-agent `top_n_sigma` + the `dry_*` repetition family help the doom-loop-prone local model.
Today the Agent type threads `temperature/top_p/top_k/min_p/presence_penalty` into the inference
request (`stream-phase.ts:396438`). Add `top_n_sigma`, `dry_multiplier`, `dry_base`,
`dry_allowed_length`, `dry_penalty_last_n` as first-class Agent fields (`types/api.ts`), parse them in
`agents.ts:parseFrontmatter` (same bounded per-field numeric pattern + out-of-range warn), and thread
them into the request body **via the same mechanism `top_k`/`min_p` already use** (the agent must
confirm whether that's an AI-SDK `providerOptions`/`extraBody` passthrough — these are llama.cpp
extensions, not standard OpenAI fields — and ride it; surface it if `top_k`/`min_p` turn out to be
silently dropped today). `--reasoning-budget` is a llama-server CLI flag already permitted by the
deny-list validator, so it works via `llama_extra_args: ["--reasoning-budget","N"]` now — document it
in `data/AGENTS.md`. apps/server only.
## #7 — Live PTY stream-json NDJSON parsing (apps/coder)
qwen/claude PTY dispatch slices stdout opaque (`dispatcher.ts` PTY path; qwen already runs
`--output-format stream-json`). Add a parser for the Claude-Code-compatible NDJSON
(`system`/`assistant`/`result`/`stream_event``content_block_delta` text/thinking/tool deltas +
`usage` + `session_id`) that maps to the existing `AgentEvent` union (`agent-backend.ts`). **Live
incremental** (decision 2026-06-01): line-buffer the PTY stdout `data` events, parse each complete
NDJSON line as it arrives, and emit broker frames live (text/reasoning/tool) like the ACP/opencode
paths — plus accumulate for `persistExternalAgentTurn`. claude gets `--output-format stream-json` too.
One parser serves both (same schema). apps/coder only (`pty-dispatch.ts`, `dispatcher.ts`, new
`stream-json-parser.ts` + test).
## #8 — Surface opencode token usage (apps/coder route + apps/web)
`agent_sessions.input_tokens/output_tokens/cost` are accumulated (v2.6.8) but the
`GET /api/sessions/:id/agent-sessions` SELECT + the `AgentSessionInfo` type drop them. Add the 3
columns to both, render condensed beside the existing session chip in `AgentComposerBar`
(ChatThroughput styling: `tabular-nums`, muted, e.g. "12.4K in / 3.2K out / $0.25"). MUST NOT touch
Sam's uncommitted WIP (`ChatTabBar`, `SessionLandingPage`, `Workspace`, `useWorkspacePanes`,
`PaneHeaderActions`).
## Decisions (2026-06-01)
- #7 surfacing: **live incremental** streaming (not parse-at-end).
## Verify
- `pnpm -C apps/server test` (+ new agent-parse tests); `pnpm -C apps/coder test` (+ new parser tests)
- `pnpm -C apps/server build && pnpm -C apps/coder build`; `npx tsc -p apps/web/tsconfig.app.json --noEmit`