v1.14.0-outer-loop: explicit while loop replaces inference recursion

Converts the ad-hoc executeToolPhase → runAssistantTurn recursion into an
explicit while (stepNumber < effectiveCap) loop. A step is one stream-and-
tool-execute iteration; the loop terminates on non-tool finish, step-cap hit,
doom-loop, budget exhaustion, abort, or synthesis success.

MAX_STEPS = 200 hard ceiling (4x old effective limit from budget). Per-agent
steps: field in AGENTS.md frontmatter sets tighter caps (Refactorer: 5,
Architect: 20, others: unset = bounded only by MAX_STEPS). Resolution:
effectiveCap = Math.min(agent.steps ?? Infinity, MAX_STEPS).

executeToolPhase no longer recurses — returns ToolPhaseResult struct
(action: 'continue' | 'paused' | 'synthesis_done') so the caller decides
whether to continue or break. steps: 0 handled as "no tool calls allowed"
via runTextOnlyTurn (one text-only stream phase, tool calls ignored with
warn log).

Step-cap hits produce a sentinel summary (reuses cap_hit kind so
CapHitSentinel.tsx renders without frontend changes; text distinguishes
"Step limit reached" from "Tool budget exhausted"). Doom-loop check migrated
to top of loop body — same predicate, same threshold (3), break instead of
return.

step_start parts are in the schema CHECK but not emitted as message_parts —
writing before the stream phase creates a sequence-0 collision with
partsFromAssistantMessage. Structured log line emitted instead. Adversarial
review caught the collision pre-deploy.

332/332 server tests passing. No frontend changes. No schema changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-23 20:29:21 +00:00
parent 211e903620
commit f4a97808ad
14 changed files with 756 additions and 204 deletions

View File

@@ -2,6 +2,10 @@
All notable changes per release tag. Most recent on top, ordered by tag creation date (which matches the git history). Tag names follow `vMAJOR.MINOR.PATCH-slug` — the slug describes what shipped, so the tag name alone is enough to recall the batch. All notable changes per release tag. Most recent on top, ordered by tag creation date (which matches the git history). Tag names follow `vMAJOR.MINOR.PATCH-slug` — the slug describes what shipped, so the tag name alone is enough to recall the batch.
## v1.14.0-outer-loop — 2026-05-23
Converts the inference engine's ad-hoc `executeToolPhase → runAssistantTurn` recursion into an explicit `while` loop with a configurable step cap. A step is one stream-and-tool-execute iteration; the loop terminates on non-tool finish, step-cap hit, doom-loop, budget exhaustion, abort, or synthesis success. `MAX_STEPS = 200` is the hard ceiling (4x the old effective limit from budget); per-agent `steps:` field in AGENTS.md frontmatter sets tighter caps (Refactorer: 5, Architect: 20, others: unset = bounded only by MAX_STEPS). `executeToolPhase` no longer recurses — returns a `ToolPhaseResult` struct (`action: 'continue' | 'paused' | 'synthesis_done'`) so the caller (the while loop) decides whether to continue or break. `steps: 0` is handled as "no tool calls allowed" — one text-only stream phase, tool calls ignored with a warn log. Step-cap hits produce a sentinel summary (reuses `cap_hit` kind so `CapHitSentinel.tsx` renders it without frontend changes; text distinguishes "Step limit reached" from "Tool budget exhausted"). Doom-loop check migrated from pre-recursion position to top of loop body — same predicate (`detectDoomLoop`), same threshold (3 identical calls), `break` instead of `return`. `step_start` parts are in the schema CHECK but not emitted as message_parts in v1.14 — writing to the assistant message before the stream phase creates a sequence-0 collision with `partsFromAssistantMessage`; a structured log line is emitted instead. Adversarial review caught the collision pre-deploy. 332/332 server tests passing; no frontend changes. Pairs with `v1.13.20-drop-legacy-cols` (parts is now the sole source of truth, and this batch's loop operates entirely through parts).
## v1.13.20-drop-legacy-cols — 2026-05-23 ## v1.13.20-drop-legacy-cols — 2026-05-23
Final phase of the v1.13.0 strangler-fig migration. Removes the dual-write into `messages.tool_calls` / `messages.tool_results` JSON columns and drops the columns themselves; `message_parts` is now the only source of truth for tool-call and tool-result data. 10 dual-write sites stripped (5 in `tool-phase.ts`, 2 in `routes/skills.ts`, 2 in `routes/messages.ts`, 1 in `routes/chats.ts` fork-clone) — recon's grep-driven inventory caught 2 sites beyond the original v1.13.2 roadmap count. `messages_with_parts` view simplified to parts-only subselects (COALESCE fallbacks gone) and rewritten via `CREATE OR REPLACE VIEW` BEFORE the column DROP since Postgres rejects column-drop on view-referenced cols. Adversarial review caught a runtime bug the green test suite missed: `chats.ts:/api/chats/:id/discard_stale` had a `RETURNING ... tool_calls, tool_results, ...` clause referencing the dropped columns; would have crashed on every 60s-no-token-activity recovery in production. Fixed by switching to two-step UPDATE-then-SELECT-from-view so the response keeps the parts-synthesized fields. `Message` API type retains `tool_calls?` / `tool_results?` fields (override on the original v1.13.2 plan) — the view continues to populate them from parts, so the wire shape is unchanged and the frontend needs no updates. v1.12.1 cleanup block (`DROP CONSTRAINT messages_status_check`/`messages_role_check`) removed — those one-shots have done their work. `tool_cost_stats.test.ts` had a direct `INSERT INTO messages` touching the legacy columns that wasn't in the roadmap's inventory; rewritten to parts-table inserts and confirmed semantically faithful. 339/339 server tests passing including the 7 DB-integration tests (live-DB applied the schema migration and ran the parts-only view end-to-end). Pairs with `v1.13.0-ai-sdk-v6` (which introduced the dual-write) and `v1.13.1-B` (which moved the read path to `messages_with_parts`); umbrella `v1.13` tag ships on the same commit. Final phase of the v1.13.0 strangler-fig migration. Removes the dual-write into `messages.tool_calls` / `messages.tool_results` JSON columns and drops the columns themselves; `message_parts` is now the only source of truth for tool-call and tool-result data. 10 dual-write sites stripped (5 in `tool-phase.ts`, 2 in `routes/skills.ts`, 2 in `routes/messages.ts`, 1 in `routes/chats.ts` fork-clone) — recon's grep-driven inventory caught 2 sites beyond the original v1.13.2 roadmap count. `messages_with_parts` view simplified to parts-only subselects (COALESCE fallbacks gone) and rewritten via `CREATE OR REPLACE VIEW` BEFORE the column DROP since Postgres rejects column-drop on view-referenced cols. Adversarial review caught a runtime bug the green test suite missed: `chats.ts:/api/chats/:id/discard_stale` had a `RETURNING ... tool_calls, tool_results, ...` clause referencing the dropped columns; would have crashed on every 60s-no-token-activity recovery in production. Fixed by switching to two-step UPDATE-then-SELECT-from-view so the response keeps the parts-synthesized fields. `Message` API type retains `tool_calls?` / `tool_results?` fields (override on the original v1.13.2 plan) — the view continues to populate them from parts, so the wire shape is unchanged and the frontend needs no updates. v1.12.1 cleanup block (`DROP CONSTRAINT messages_status_check`/`messages_role_check`) removed — those one-shots have done their work. `tool_cost_stats.test.ts` had a direct `INSERT INTO messages` touching the legacy columns that wasn't in the roadmap's inventory; rewritten to parts-table inserts and confirmed semantically faithful. 339/339 server tests passing including the 7 DB-integration tests (live-DB applied the schema migration and ran the parts-only view end-to-end). Pairs with `v1.13.0-ai-sdk-v6` (which introduced the dual-write) and `v1.13.1-B` (which moved the read path to `messages_with_parts`); umbrella `v1.13` tag ships on the same commit.

View File

@@ -46,7 +46,7 @@ Tests: `pnpm -C apps/server test` runs the vitest suite. No test harness on `app
- **Zod** for request validation and config parsing. - **Zod** for request validation and config parsing.
Key services: Key services:
- **`services/inference/`** — Public surface re-exported via `inference/index.ts`; callers import from `./services/inference/index.js` explicitly (NodeNext doesn't honor directory-index resolution). Layout: `turn.ts` (runAssistantTurn / runInference / createInferenceRunner; exports `InferenceFrame`, `InferenceContext`, `TurnArgs`, `StreamResult`), `stream-phase.ts` (streamCompletion as a v1.13.1-A AI SDK adapter + executeStreamPhase), `provider.ts` (`upstreamModel(baseURL, modelId)` wrapping `createOpenAICompatible` against llama-swap), `tool-phase.ts` (executeToolPhase; value back-edges into turn.ts for the runAssistantTurn recursion — cycle safe because deref at call time, not module top-level), `sentinel-summaries.ts` (runCapHitSummary + runDoomLoopSummary + their sentinel inserters), `error-handler.ts` (handleAbortOrError, finalizeCompletion), `payload.ts` (buildMessagesPayload, loadContext, maybeFlagForCompaction, `OpenAiMessage`), `sentinels.ts` (`detectDoomLoop`, `DOOM_LOOP_THRESHOLD`, sentinel predicates), `budget.ts` (resolveToolBudget), `xml-parser.ts` (qwen3.6 XML tool-call fallback — KEEP, AI SDK doesn't handle inline-XML tool calls), `parts.ts` (parts-table write helpers: `partsFromAssistantMessage`, `partsFromToolMessage`, `insertParts` — v1.13.20 made parts the sole source of truth), `prune.ts` (v1.13.4 two-tier compaction; `selectPruneTargets` is the pure decision helper), `types.ts` (`StreamPhaseState`, `DB_FLUSH_INTERVAL_MS`). **`TurnArgs`** is the per-turn state envelope threaded through the `executeToolPhase → runAssistantTurn` recursion; reset in `runInference` at user-message boundary. Add new per-turn state to `TurnArgs`, not module-level closures. - **`services/inference/`** — Public surface re-exported via `inference/index.ts`; callers import from `./services/inference/index.js` explicitly (NodeNext doesn't honor directory-index resolution). Layout: `turn.ts` (runAssistantTurn / runInference / createInferenceRunner; exports `InferenceFrame`, `InferenceContext`, `TurnArgs`, `StreamResult`, `MAX_STEPS`), `stream-phase.ts` (streamCompletion as a v1.13.1-A AI SDK adapter + executeStreamPhase), `provider.ts` (`upstreamModel(baseURL, modelId)` wrapping `createOpenAICompatible` against llama-swap), `tool-phase.ts` (executeToolPhase → returns `ToolPhaseResult`; no longer recurses into runAssistantTurn — v1.14.0 converted the recursion to an explicit while loop in turn.ts), `sentinel-summaries.ts` (runCapHitSummary + runDoomLoopSummary + runStepCapSummary + their sentinel inserters), `error-handler.ts` (handleAbortOrError, finalizeCompletion), `payload.ts` (buildMessagesPayload, loadContext, maybeFlagForCompaction, `OpenAiMessage`), `sentinels.ts` (`detectDoomLoop`, `DOOM_LOOP_THRESHOLD`, sentinel predicates), `budget.ts` (resolveToolBudget), `xml-parser.ts` (qwen3.6 XML tool-call fallback — KEEP, AI SDK doesn't handle inline-XML tool calls), `parts.ts` (parts-table write helpers: `partsFromAssistantMessage`, `partsFromToolMessage`, `insertParts` — v1.13.20 made parts the sole source of truth), `prune.ts` (v1.13.4 two-tier compaction; `selectPruneTargets` is the pure decision helper), `types.ts` (`StreamPhaseState`, `DB_FLUSH_INTERVAL_MS`). **`TurnArgs`** is the per-turn state envelope populated from loop locals each iteration; reset in `runInference` at user-message boundary. The outer loop in `runAssistantTurn` (v1.14.0) runs `while (stepNumber < effectiveCap)` where `effectiveCap = Math.min(agent.steps ?? Infinity, MAX_STEPS=200)`. Per-agent `steps:` field in AGENTS.md frontmatter. `steps: 0` means text-only (no tool execution). Step-cap hit writes a `cap_hit` sentinel so `CapHitSentinel.tsx` renders it.
- **AI SDK v6 streamCompletion adapter** (v1.13.1-A; `services/inference/stream-phase.ts`). `streamText` is the underlying call; the BooCode layer above (executeStreamPhase, finalize, dual-write) is shape-preserved via an adapter. Five gotchas the LSP/test suite won't catch: - **AI SDK v6 streamCompletion adapter** (v1.13.1-A; `services/inference/stream-phase.ts`). `streamText` is the underlying call; the BooCode layer above (executeStreamPhase, finalize, dual-write) is shape-preserved via an adapter. Five gotchas the LSP/test suite won't catch:
- **Abort signals are swallowed.** `streamText`'s `fullStream` iterator exits cleanly when `abortSignal` fires — no throw. Post-iteration `if (signal?.aborted) throw <AbortError>` is required; without it the row finalizes as `complete` instead of `cancelled`. Comment in stream-phase.ts pins this; don't refactor it away. - **Abort signals are swallowed.** `streamText`'s `fullStream` iterator exits cleanly when `abortSignal` fires — no throw. Post-iteration `if (signal?.aborted) throw <AbortError>` is required; without it the row finalizes as `complete` instead of `cancelled`. Comment in stream-phase.ts pins this; don't refactor it away.
- **Usage lands only at stream end** via `await result.usage` (`inputTokens` / `outputTokens` v6 names → mapped to `promptTokens` / `completionTokens` for the existing onUsage callback). Mid-stream live tok/s is gone vs v1.12.2; ChatThroughput shows a single value at stream end. - **Usage lands only at stream end** via `await result.usage` (`inputTokens` / `outputTokens` v6 names → mapped to `promptTokens` / `completionTokens` for the existing onUsage callback). Mid-stream live tok/s is gone vs v1.12.2; ChatThroughput shows a single value at stream end.

View File

@@ -37,6 +37,10 @@ interface ParsedFrontmatter {
// v1.8.2: optional per-agent tool-loop budget. Absent → inference resolves // v1.8.2: optional per-agent tool-loop budget. Absent → inference resolves
// from the agent's toolset at runtime. // from the agent's toolset at runtime.
max_tool_calls?: number; max_tool_calls?: number;
// v1.14.0: optional per-agent step cap. Absent → bounded only by MAX_STEPS
// (200) in the outer loop. Integer ≥ 0; steps: 0 means "no tool calls
// allowed" — the model responds text-only.
steps?: number;
} }
function stripQuotes(s: string): string { function stripQuotes(s: string): string {
@@ -112,6 +116,21 @@ function parseFrontmatter(yaml: string): { data: ParsedFrontmatter; errors: stri
} else { } else {
errors.push(`max_tool_calls must be an integer 1-100 (got "${valueRaw}")`); errors.push(`max_tool_calls must be an integer 1-100 (got "${valueRaw}")`);
} }
} else if (key === 'steps') {
// v1.14.0: per-agent step cap for the outer inference loop. Integer ≥ 0.
// steps: 0 means "no tool calls allowed" — model responds text-only.
// Non-integer or negative values are warned and ignored (falls back to
// MAX_STEPS ceiling), matching the max_tool_calls pattern above.
const n = Number(valueRaw);
if (Number.isInteger(n) && n >= 0) {
data.steps = n;
} else if (Number.isInteger(n)) {
console.warn(
`agents: steps ${n} is negative, ignoring (falling back to default)`,
);
} else {
errors.push(`steps must be a non-negative integer (got "${valueRaw}")`);
}
} }
// Unknown keys silently ignored — forward-compat. // Unknown keys silently ignored — forward-compat.
} }
@@ -204,6 +223,7 @@ function parseAgentSection(section: RawSection): Omit<Agent, 'source'> {
tools: filteredTools, tools: filteredTools,
model: typeof fm.model === 'string' && fm.model.length > 0 ? fm.model : null, model: typeof fm.model === 'string' && fm.model.length > 0 ? fm.model : null,
max_tool_calls: typeof fm.max_tool_calls === 'number' ? fm.max_tool_calls : null, max_tool_calls: typeof fm.max_tool_calls === 'number' ? fm.max_tool_calls : null,
steps: typeof fm.steps === 'number' ? fm.steps : null,
}; };
} }

View File

@@ -6,6 +6,7 @@
export { export {
createInferenceRunner, createInferenceRunner,
MAX_STEPS,
runAssistantTurn, runAssistantTurn,
runInference, runInference,
} from './turn.js'; } from './turn.js';
@@ -16,5 +17,6 @@ export type {
StreamResult, StreamResult,
TurnArgs, TurnArgs,
} from './turn.js'; } from './turn.js';
export type { ToolPhaseResult } from './tool-phase.js';
export { detectDoomLoop, DOOM_LOOP_THRESHOLD } from './sentinels.js'; export { detectDoomLoop, DOOM_LOOP_THRESHOLD } from './sentinels.js';
export { buildMessagesPayload } from './payload.js'; export { buildMessagesPayload } from './payload.js';

View File

@@ -476,6 +476,202 @@ export async function runDoomLoopSummary(
); );
} }
// v1.14.0: step-cap wrap-up. Mirrors runCapHitSummary structurally — same
// in-flight-slot reuse, same tools-disabled streaming-summary call, same
// post-finalize sentinel insert + chat_status drop. Difference: the note
// text names the step limit rather than the tool budget. Sentinel reuses
// metadata.kind = 'cap_hit' so the frontend CapHitSentinel component
// renders it without changes.
const STEP_CAP_NOTE = (steps: number, cap: number) =>
`You've reached the step limit (${steps}/${cap} steps). Produce the best answer you can with what you have. Do not call more tools.`;
export async function runStepCapSummary(
ctx: InferenceContext,
args: TurnArgs,
session: Session,
project: Project,
history: Message[],
agent: Agent | null,
steps: number,
cap: number,
): Promise<void> {
const { sessionId, chatId, assistantMessageId, signal } = args;
const messages = await buildMessagesPayload(session, project, history, agent, ctx.log);
messages.push({ role: 'system', content: STEP_CAP_NOTE(steps, cap) });
const startedRow = await ctx.sql<{ started_at: string }[]>`
UPDATE messages
SET started_at = clock_timestamp()
WHERE id = ${assistantMessageId}
RETURNING started_at
`;
const startedAt = startedRow[0]?.started_at ?? null;
ctx.publish(sessionId, {
type: 'message_started',
message_id: assistantMessageId,
chat_id: chatId,
role: 'assistant',
});
let accumulated = '';
let pendingFlushTimer: NodeJS.Timeout | null = null;
let flushPromise: Promise<unknown> = Promise.resolve();
const flushNow = () => {
if (pendingFlushTimer) {
clearTimeout(pendingFlushTimer);
pendingFlushTimer = null;
}
const snapshot = accumulated;
flushPromise = flushPromise.then(() =>
ctx.sql`UPDATE messages SET content = ${snapshot} WHERE id = ${assistantMessageId}`
);
};
const scheduleFlush = () => {
if (pendingFlushTimer) return;
pendingFlushTimer = setTimeout(() => {
pendingFlushTimer = null;
flushNow();
}, DB_FLUSH_INTERVAL_MS);
};
let summaryOk = false;
let summarySoftCancelled = false;
let summaryError: string | null = null;
let result: StreamResult | null = null;
try {
result = await streamCompletion(
ctx,
session.model,
messages,
{ tools: null, temperature: agent?.temperature },
(delta) => {
accumulated += delta;
ctx.publish(sessionId, {
type: 'delta',
message_id: assistantMessageId,
chat_id: chatId,
content: delta,
});
scheduleFlush();
},
undefined,
signal,
);
summaryOk = true;
} catch (err) {
if (err instanceof Error && err.name === 'AbortError') {
summarySoftCancelled = true;
} else {
summaryError = err instanceof Error ? err.message : String(err);
}
} finally {
if (pendingFlushTimer) {
clearTimeout(pendingFlushTimer);
pendingFlushTimer = null;
}
await flushPromise;
}
if (summaryOk && result) {
const mctx = await modelContext.getModelContext(session.model);
const nCtx = mctx?.n_ctx ?? null;
const [updated] = await ctx.sql<
{ tokens_used: number | null; ctx_used: number | null; ctx_max: number | null; finished_at: string | null }[]
>`
UPDATE messages
SET content = ${result.content},
status = 'complete',
tokens_used = ${result.completionTokens},
ctx_used = ${result.promptTokens},
ctx_max = ${nCtx},
finished_at = clock_timestamp()
WHERE id = ${assistantMessageId}
RETURNING tokens_used, ctx_used, ctx_max, finished_at
`;
ctx.publish(sessionId, {
type: 'message_complete',
message_id: assistantMessageId,
chat_id: chatId,
tokens_used: updated?.tokens_used ?? null,
ctx_used: updated?.ctx_used ?? null,
ctx_max: updated?.ctx_max ?? null,
started_at: startedAt,
finished_at: updated?.finished_at ?? null,
model: session.model,
});
} else if (summarySoftCancelled) {
await ctx.sql`
UPDATE messages
SET content = ${accumulated},
status = 'cancelled',
finished_at = clock_timestamp()
WHERE id = ${assistantMessageId}
`;
ctx.publish(sessionId, {
type: 'message_complete',
message_id: assistantMessageId,
chat_id: chatId,
});
} else {
const errMeta: MessageMetadata = {
kind: 'error',
error_reason: 'summary_after_cap_failed',
error_text: summaryError ?? 'step-cap summary failed',
};
await ctx.sql`
UPDATE messages
SET content = ${accumulated},
status = 'failed',
finished_at = clock_timestamp(),
metadata = ${ctx.sql.json(errMeta as never)}
WHERE id = ${assistantMessageId}
`;
ctx.publish(sessionId, {
type: 'error',
message_id: assistantMessageId,
chat_id: chatId,
error: summaryError ?? 'step-cap summary failed',
reason: 'summary_after_cap_failed',
});
}
const [sessRow] = await ctx.sql<{ project_id: string; name: string; updated_at: string }[]>`
UPDATE sessions SET updated_at = clock_timestamp()
WHERE id = ${sessionId}
RETURNING project_id, name, updated_at
`;
ctx.publishUser({
type: 'session_updated',
session_id: sessionId,
project_id: sessRow!.project_id,
name: sessRow!.name,
updated_at: sessRow!.updated_at,
});
// Reuse cap_hit sentinel so the frontend CapHitSentinel component renders
// it without changes. The content text distinguishes step cap from budget.
await insertCapHitSentinel(ctx, sessionId, chatId, agent, cap);
if (summaryOk || summarySoftCancelled) {
ctx.publishUser({ type: 'chat_status', chat_id: chatId, status: 'idle', at: new Date().toISOString() });
} else {
ctx.publishUser({
type: 'chat_status',
chat_id: chatId,
status: 'error',
at: new Date().toISOString(),
reason: 'summary_after_cap_failed',
});
}
ctx.log.info(
{ sessionId, chatId, assistantMessageId, steps, cap, summaryOk, summaryCancelled: summarySoftCancelled },
'inference step-cap summary finished',
);
}
async function insertDoomLoopSentinel( async function insertDoomLoopSentinel(
ctx: InferenceContext, ctx: InferenceContext,
sessionId: string, sessionId: string,

View File

@@ -19,11 +19,6 @@ import type {
StreamResult, StreamResult,
TurnArgs, TurnArgs,
} from './turn.js'; } from './turn.js';
// v1.12.4: ESM value-import cycle. executeToolPhase recurses into
// runAssistantTurn which lives in inference.ts. The cycle is safe because
// the reference is read at call time (inside an async function body), not
// at module top-level. Node + tsc resolve this cleanly.
import { runAssistantTurn } from './turn.js';
// v1.13.13: synthesis pipeline — replaces the immediate recursive turn when // v1.13.13: synthesis pipeline — replaces the immediate recursive turn when
// any of this batch's tool calls is in SYNTHESIS_TOOLS. Falls through to // any of this batch's tool calls is in SYNTHESIS_TOOLS. Falls through to
// recursion on synthesis failure (timeout / model error). See module header // recursion on synthesis failure (timeout / model error). See module header
@@ -86,6 +81,16 @@ async function executeToolCall(
} }
} }
// v1.14.0: return struct from executeToolPhase so the caller (the outer
// while loop in turn.ts) can decide whether to continue, break, or handle
// synthesis. Replaces the recursive call into runAssistantTurn.
export interface ToolPhaseResult {
action: 'continue' | 'paused' | 'synthesis_done';
toolCallCount: number;
toolCalls: ToolCall[];
nextAssistantId: string | null;
}
export async function executeToolPhase( export async function executeToolPhase(
ctx: InferenceContext, ctx: InferenceContext,
args: TurnArgs, args: TurnArgs,
@@ -93,8 +98,8 @@ export async function executeToolPhase(
startedAt: string | null, startedAt: string | null,
session: Session, session: Session,
projectRoot: string projectRoot: string
): Promise<void> { ): Promise<ToolPhaseResult> {
const { sessionId, chatId, assistantMessageId, toolsUsed, signal } = args; const { sessionId, chatId, assistantMessageId } = args;
const { content, toolCalls, promptTokens, completionTokens } = result; const { content, toolCalls, promptTokens, completionTokens } = result;
// v1.11.3: ctx_max comes from llama-swap /upstream/<model>/props, not the // v1.11.3: ctx_max comes from llama-swap /upstream/<model>/props, not the
@@ -296,7 +301,12 @@ export async function executeToolPhase(
{ sessionId, chatId, assistantMessageId }, { sessionId, chatId, assistantMessageId },
'inference paused awaiting user input', 'inference paused awaiting user input',
); );
return; return {
action: 'paused' as const,
toolCallCount: toolCalls.length,
toolCalls,
nextAssistantId: null,
};
} }
// v1.13.13: synthesis-pipeline branch. When any of this batch's tool calls // v1.13.13: synthesis-pipeline branch. When any of this batch's tool calls
@@ -328,30 +338,30 @@ export async function executeToolPhase(
...(typeof out?.truncated === 'boolean' ? { truncated: out.truncated } : {}), ...(typeof out?.truncated === 'boolean' ? { truncated: out.truncated } : {}),
...(typeof out?.outputPath === 'string' ? { outputPath: out.outputPath } : {}), ...(typeof out?.outputPath === 'string' ? { outputPath: out.outputPath } : {}),
}); });
if (ran) return; if (ran) {
return {
action: 'synthesis_done' as const,
toolCallCount: toolCalls.length,
toolCalls,
nextAssistantId: null,
};
}
// ran === false → synthesis failed (timeout / model error) → fall through // ran === false → synthesis failed (timeout / model error) → fall through
// to the standard recursive turn below. The synth message (if created) // to the standard continue path below. The synth message (if created)
// was already marked status='failed' inside runSynthesisPass. // was already marked status='failed' inside runSynthesisPass.
} }
// v1.14.0: create the next assistant row and return a continue result.
// The caller (outer while loop in turn.ts) handles the iteration.
const [nextAssistant] = await ctx.sql<{ id: string }[]>` const [nextAssistant] = await ctx.sql<{ id: string }[]>`
INSERT INTO messages (session_id, chat_id, role, content, status, created_at) INSERT INTO messages (session_id, chat_id, role, content, status, created_at)
VALUES (${sessionId}, ${chatId}, 'assistant', '', 'streaming', clock_timestamp()) VALUES (${sessionId}, ${chatId}, 'assistant', '', 'streaming', clock_timestamp())
RETURNING id RETURNING id
`; `;
await runAssistantTurn(ctx, { return {
sessionId, action: 'continue' as const,
chatId, toolCallCount: toolCalls.length,
assistantMessageId: nextAssistant!.id, toolCalls,
// v1.8.2: charge this turn's actual tool invocations against the budget. nextAssistantId: nextAssistant!.id,
// One assistant message can emit multiple tool_calls, so we add the run };
// count, not 1. The next turn's budget check sees the cumulative total.
toolsUsed: toolsUsed + result.toolCalls.length,
// v1.11.6: append the just-executed tool calls to the per-turn history
// so the next runAssistantTurn's doom-loop check can see them. We don't
// cap the array length here — per-turn budgets keep it bounded
// (typically <30 entries), and slicing happens inside detectDoomLoop.
recentToolCalls: [...args.recentToolCalls, ...result.toolCalls],
signal,
});
} }

View File

@@ -16,11 +16,9 @@ import { resolveProjectRoot } from '../path_guard.js';
import { maybeAutoNameChat } from '../auto_name.js'; import { maybeAutoNameChat } from '../auto_name.js';
import { getAgentById } from '../agents.js'; import { getAgentById } from '../agents.js';
import * as compaction from '../compaction.js'; import * as compaction from '../compaction.js';
import * as modelContext from '../model-context.js';
import type { Broker } from '../broker.js'; import type { Broker } from '../broker.js';
import { resolveToolBudget } from './budget.js'; import { resolveToolBudget } from './budget.js';
import { import {
DOOM_LOOP_THRESHOLD,
detectDoomLoop, detectDoomLoop,
} from './sentinels.js'; } from './sentinels.js';
import { import {
@@ -33,15 +31,23 @@ import {
} from './error-handler.js'; } from './error-handler.js';
import { import {
executeStreamPhase, executeStreamPhase,
streamCompletion,
} from './stream-phase.js'; } from './stream-phase.js';
import { executeToolPhase } from './tool-phase.js'; import { executeToolPhase, type ToolPhaseResult } from './tool-phase.js';
import { DB_FLUSH_INTERVAL_MS, type StreamPhaseState } from './types.js'; import type { StreamPhaseState } from './types.js';
import { import {
runCapHitSummary, runCapHitSummary,
runDoomLoopSummary, runDoomLoopSummary,
runStepCapSummary,
} from './sentinel-summaries.js'; } from './sentinel-summaries.js';
// v1.14.0: hard ceiling on the number of stream-and-tool iterations per
// user-message turn. Per-agent cap via agent.steps is the primary knob;
// MAX_STEPS is the safety ceiling. 200 is 4x the effective budget ceiling
// (50 tool calls) — in practice budget fires first unless the model makes
// many 0-tool-call iterations (which exit the loop via the non-tool finish
// path anyway).
export const MAX_STEPS = 200;
// v1.12.4: re-exported so external callers (tests, future consumers) keep // v1.12.4: re-exported so external callers (tests, future consumers) keep
// importing from services/inference.js as the public surface. // importing from services/inference.js as the public surface.
export { detectDoomLoop, DOOM_LOOP_THRESHOLD } from './sentinels.js'; export { detectDoomLoop, DOOM_LOOP_THRESHOLD } from './sentinels.js';
@@ -145,13 +151,70 @@ export async function runAssistantTurn(
ctx: InferenceContext, ctx: InferenceContext,
args: TurnArgs, args: TurnArgs,
): Promise<void> { ): Promise<void> {
const { sessionId, chatId } = args; const { sessionId, chatId, signal } = args;
// v1.11: if the prior turn flagged this chat for compaction, run it first // v1.14.0: resolve agent once at the top. The agent stays fixed for the
// so loadContext below reads the post-compaction history. We swallow // duration of this user-message turn — PATCH agent_id mid-conversation
// compaction failures (clearing the flag so we don't loop) and proceed // takes effect on the next runInference, not mid-loop.
// with the un-compacted history — a slow turn that hits the model's const initialLoaded = await loadContext(ctx.sql, sessionId, chatId);
// hard limit is recoverable; a dead session is not. if (!initialLoaded) {
ctx.log.warn({ sessionId }, 'inference: session or project missing');
return;
}
const { session, project } = initialLoaded;
const agent = session.agent_id
? await getAgentById(project.path, session.agent_id)
: null;
const budget = resolveToolBudget(agent);
// v1.14.0: effectiveCap = min(agent.steps ?? Infinity, MAX_STEPS).
// steps: 0 means "no tool calls allowed" — the first stream phase runs
// but if it emits tool calls they are not executed (finalize as text-only).
const effectiveCap = Math.min(agent?.steps ?? Infinity, MAX_STEPS);
// steps: 0 special case — model responds text-only. The while loop would
// never enter (effectiveCap === 0), so we handle it explicitly before the
// loop. The model always gets at least one chance to respond with text.
if (effectiveCap === 0) {
const loaded = await loadContext(ctx.sql, sessionId, chatId);
if (loaded) {
await runTextOnlyTurn(ctx, args, loaded.session, loaded.project, loaded.history, agent);
}
return;
}
let stepNumber = 0;
let toolsUsed = args.toolsUsed;
let recentToolCalls = args.recentToolCalls;
let assistantMessageId = args.assistantMessageId;
while (stepNumber < effectiveCap) {
// ---- doom-loop check (moved from top-of-function) ----
const loop = detectDoomLoop(recentToolCalls);
if (loop) {
// Need fresh history for the summary.
const loaded = await loadContext(ctx.sql, sessionId, chatId);
if (loaded) {
const iterArgs: TurnArgs = { sessionId, chatId, assistantMessageId, toolsUsed, recentToolCalls, signal };
await runDoomLoopSummary(ctx, iterArgs, loaded.session, loaded.project, loaded.history, agent, loop);
}
break;
}
// ---- budget check (moved from top-of-function) ----
if (toolsUsed >= budget) {
const loaded = await loadContext(ctx.sql, sessionId, chatId);
if (loaded) {
const iterArgs: TurnArgs = { sessionId, chatId, assistantMessageId, toolsUsed, recentToolCalls, signal };
await runCapHitSummary(ctx, iterArgs, loaded.session, loaded.project, loaded.history, agent, budget);
}
break;
}
// ---- compaction check ----
// v1.11: if the prior turn flagged this chat for compaction, run it
// before loadContext so we read post-compaction history. Swallow
// failures and proceed with un-compacted history.
const chatFlag = await ctx.sql<{ needs_compaction: boolean }[]>` const chatFlag = await ctx.sql<{ needs_compaction: boolean }[]>`
SELECT needs_compaction FROM chats WHERE id = ${chatId} SELECT needs_compaction FROM chats WHERE id = ${chatId}
`; `;
@@ -170,50 +233,103 @@ export async function runAssistantTurn(
} }
} }
// ---- load context (must re-load each iteration — new messages since last step) ----
const loaded = await loadContext(ctx.sql, sessionId, chatId); const loaded = await loadContext(ctx.sql, sessionId, chatId);
if (!loaded) { if (!loaded) {
ctx.log.warn({ sessionId }, 'inference: session or project missing'); ctx.log.warn({ sessionId }, 'inference: session or project missing mid-loop');
return; break;
} }
const { session, project, history } = loaded; const { session: iterSession, project: iterProject, history } = loaded;
const projectRoot = await resolveProjectRoot(project.path); const projectRoot = await resolveProjectRoot(iterProject.path);
// Agent resolution is per-turn so PATCH agent_id mid-conversation takes
// effect on the next message. Unknown agent_id returns null silently —
// session falls back to base prompt + all tools + default temperature.
const agent = session.agent_id
? await getAgentById(project.path, session.agent_id)
: null;
// v1.8.2: cap-hit replaces the older "tool loop depth exceeded" failure. // v1.14.0: log step boundary for instrumentation. step_start parts are in
// When we've already burned the budget *before* this turn even runs, we // the schema CHECK but not emitted here — writing to the assistant message
// skip straight to the summary flow — the in-flight assistant message slot // before the stream phase creates a sequence-0 collision with
// gets reused for the wrap-up reply instead of being marked failed. // partsFromAssistantMessage. A WS frame or structured log is sufficient
const budget = resolveToolBudget(agent); // since the frontend doesn't render step boundaries in v1.14.
if (args.toolsUsed >= budget) { ctx.log.info({ sessionId, chatId, step: stepNumber, assistantMessageId }, 'step_start');
await runCapHitSummary(ctx, args, session, project, history, agent, budget);
return; // ---- build messages + stream phase ----
const messages = await buildMessagesPayload(iterSession, iterProject, history, agent, ctx.log);
const webToolsEnabled =
iterSession.web_search_enabled ?? iterProject.default_web_search_enabled ?? false;
const iterArgs: TurnArgs = { sessionId, chatId, assistantMessageId, toolsUsed, recentToolCalls, signal };
const state: StreamPhaseState = { accumulated: '', startedAt: null };
let result: StreamResult;
try {
result = await executeStreamPhase(ctx, iterArgs, iterSession, messages, state, agent, webToolsEnabled);
} catch (err) {
await handleAbortOrError(ctx, iterArgs, state.accumulated, err);
break;
} }
// v1.11.6: doom-loop guard. Detected BEFORE the budget cap (the model can // ---- non-tool finish → finalize and exit ----
// burn through 3 identical calls long before the 15-call budget fires). if (result.toolCalls.length === 0) {
// Same in-flight-slot-reuse pattern as runCapHitSummary — wrap-up reply await finalizeCompletion(ctx, iterArgs, result, state.startedAt, iterSession);
// lands in args.assistantMessageId, then a doom_loop sentinel is inserted break;
// to make the abort visible in the chat history.
const loop = detectDoomLoop(args.recentToolCalls);
if (loop) {
await runDoomLoopSummary(ctx, args, session, project, history, agent, loop);
return;
} }
// ---- steps: 0 edge case ----
// effectiveCap check above guarantees we're inside the loop, but this
// guard handles the theoretical case where the model emits tool calls
// on step 0 when effectiveCap would have been 0 (impossible since the
// while condition prevents entry, but kept for safety). If effectiveCap
// is 1 and we're on step 0, tool calls ARE executed — steps counts
// iterations, not post-first-stream.
// ---- tool phase ----
let toolPhaseResult: ToolPhaseResult;
try {
toolPhaseResult = await executeToolPhase(ctx, iterArgs, result, state.startedAt, iterSession, projectRoot);
} catch (err) {
// Tool phase errors are unexpected (individual tool failures are
// caught inside executeToolPhase). Log and break.
ctx.log.error({ err, sessionId, chatId, step: stepNumber }, 'tool phase threw unexpectedly');
break;
}
// ---- update loop locals ----
toolsUsed += toolPhaseResult.toolCallCount;
recentToolCalls = [...recentToolCalls, ...toolPhaseResult.toolCalls];
stepNumber++;
if (toolPhaseResult.action !== 'continue') {
// 'paused' (user input) or 'synthesis_done' — stop the loop.
break;
}
// 'continue' — advance to next assistant message.
assistantMessageId = toolPhaseResult.nextAssistantId!;
}
// ---- post-loop: step-cap sentinel ----
// When the loop exits because stepNumber reached effectiveCap, the last
// iteration's tool phase returned 'continue' with a nextAssistantId that
// is still in 'streaming' status (unfilled). Use it for the wrap-up.
if (stepNumber >= effectiveCap && effectiveCap < Infinity) {
const loaded = await loadContext(ctx.sql, sessionId, chatId);
if (loaded) {
const capArgs: TurnArgs = { sessionId, chatId, assistantMessageId, toolsUsed, recentToolCalls, signal };
await runStepCapSummary(ctx, capArgs, loaded.session, loaded.project, loaded.history, agent, stepNumber, effectiveCap);
}
}
}
// v1.14.0: special handling for steps: 0 — the model responds text-only.
// The while loop never enters (effectiveCap === 0). We stream once with
// no tools, finalize, and return. If the model emits tool calls despite
// not being offered tools, they're ignored (finalize as text-only).
async function runTextOnlyTurn(
ctx: InferenceContext,
args: TurnArgs,
session: Session,
project: Project,
history: Message[],
agent: Agent | null,
): Promise<void> {
const messages = await buildMessagesPayload(session, project, history, agent, ctx.log); const messages = await buildMessagesPayload(session, project, history, agent, ctx.log);
// Web tools are irrelevant when steps: 0 (no tool execution), but we
// v1.11.8: resolve per-chat web-tools opt-in. Tri-state on the wire: // still need to resolve the flag for executeStreamPhase's signature.
// - session.web_search_enabled = null → inherit project default
// - session.web_search_enabled = true/false → explicit
// Both web_search and web_fetch are gated by this single flag (the UI
// label is "Enable web search and fetch" — same store, both tools).
// Default is false unless explicitly opted in, matching the v1.9
// plumbing intent ("inert until Batch 8 ships the actual tools").
const webToolsEnabled = const webToolsEnabled =
session.web_search_enabled ?? project.default_web_search_enabled ?? false; session.web_search_enabled ?? project.default_web_search_enabled ?? false;
@@ -227,8 +343,12 @@ export async function runAssistantTurn(
} }
if (result.toolCalls.length > 0) { if (result.toolCalls.length > 0) {
await executeToolPhase(ctx, args, result, state.startedAt, session, projectRoot); ctx.log.warn(
return; { chatId: args.chatId, toolCallCount: result.toolCalls.length },
'steps: 0 agent emitted tool calls; ignoring and finalizing as text-only',
);
// Override: strip tool calls so finalizeCompletion treats it as text-only.
result = { ...result, toolCalls: [] };
} }
await finalizeCompletion(ctx, args, result, state.startedAt, session); await finalizeCompletion(ctx, args, result, state.startedAt, session);

View File

@@ -106,6 +106,9 @@ export interface Agent {
// agent's toolset (30 if all tools are read-only, 10 otherwise) or 15 for // agent's toolset (30 if all tools are read-only, 10 otherwise) or 15 for
// raw chat with no agent. // raw chat with no agent.
max_tool_calls: number | null; max_tool_calls: number | null;
// v1.14.0: per-agent step cap for the outer inference loop. null means
// bounded only by MAX_STEPS (200). 0 means "no tool calls allowed."
steps: number | null;
} }
// One entry per malformed `## Name` block. Per-block errors don't fail the // One entry per malformed `## Name` block. Per-block errors don't fail the

View File

@@ -73,6 +73,9 @@ export interface Agent {
// the agent's toolset (30 for all read-only, 10 otherwise) or 15 for raw // the agent's toolset (30 for all read-only, 10 otherwise) or 15 for raw
// chat with no agent. // chat with no agent.
max_tool_calls: number | null; max_tool_calls: number | null;
// v1.14.0: per-agent step cap for the outer inference loop. null means
// bounded only by MAX_STEPS (200). 0 means "no tool calls allowed."
steps: number | null;
} }
export interface AgentParseError { export interface AgentParseError {

View File

@@ -1,6 +1,6 @@
# BooCode v1.x — Roadmap # BooCode v1.x — Roadmap
Last updated: 2026-05-22 Last updated: 2026-05-23
> **Companion doc:** `boocode_code_review.md` holds the full external-repo inventory, lift rationale, and license analysis. This document is the canonical source for shipping state, version ordering, and what's planned vs. shipped. > **Companion doc:** `boocode_code_review.md` holds the full external-repo inventory, lift rationale, and license analysis. This document is the canonical source for shipping state, version ordering, and what's planned vs. shipped.
@@ -27,7 +27,7 @@ External code lifted from / referenced in: see `boocode_code_review.md` for full
----- -----
## Shipped (status as of 2026-05-22) ## Shipped (status as of 2026-05-23)
|Version |Theme |Tag | |Version |Theme |Tag |
|-----------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------| |-----------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------|
@@ -72,9 +72,9 @@ External code lifted from / referenced in: see `boocode_code_review.md` for full
----- -----
### Shipped (v1.13.x — written 2026-05-22, retagged same day) ### Shipped (v1.13.x — strangler-fig closed 2026-05-23)
All v1.13.x batches were retagged to the `vMAJOR.MINOR.PATCH-slug` scheme on 2026-05-22. `CHANGELOG.md` is the canonical per-tag record (slug describes what shipped; tag name alone recalls the batch). Tip is `v1.13.14-skills-audit` (`0fa46cd`); the next batch is `v1.13.15-codecontext-synth` (this batch, tag pending). Tags in chronological order: All v1.13.x batches use the `vMAJOR.MINOR.PATCH-slug` tag scheme adopted 2026-05-22. `CHANGELOG.md` is the canonical per-tag record (slug describes what shipped; tag name alone recalls the batch). The v1.13.x line ran 21 batches over a single intense window; the umbrella `v1.13` tag sits on `211e903` (same commit as `v1.13.20-drop-legacy-cols`), marking the strangler-fig closed. Tags in chronological order:
- `v1.13.0-ai-sdk-v6` — AI SDK v6 migration; `streamCompletion` adapter; `messages_with_parts` view; reasoning_parts end-to-end - `v1.13.0-ai-sdk-v6` — AI SDK v6 migration; `streamCompletion` adapter; `messages_with_parts` view; reasoning_parts end-to-end
- `v1.13.1-cleanup-bundle``statement_timeout='30s'`, alpha-sorted tool registry, 60s stuck-row sweeper, `experimental_repairToolCall` pass-through - `v1.13.1-cleanup-bundle``statement_timeout='30s'`, alpha-sorted tool registry, 60s stuck-row sweeper, `experimental_repairToolCall` pass-through
@@ -93,115 +93,13 @@ All v1.13.x batches were retagged to the `vMAJOR.MINOR.PATCH-slug` scheme on 202
- `v1.13.14-skills-audit` — 26 skills vendored + audited via 5 parallel agent teams; 14 kept, 11 dropped, 1 migrated to BOOCHAT.md/BOOCODER.md - `v1.13.14-skills-audit` — 26 skills vendored + audited via 5 parallel agent teams; 14 kept, 11 dropped, 1 migrated to BOOCHAT.md/BOOCODER.md
- `v1.13.15-codecontext-synth` — forced second-inference synthesis pass for codecontext overview tools (truncation-aware extraction; auto-fetched top-N files + project docs; 32k payload-budget contract preserved) - `v1.13.15-codecontext-synth` — forced second-inference synthesis pass for codecontext overview tools (truncation-aware extraction; auto-fetched top-N files + project docs; 32k payload-budget contract preserved)
- `v1.13.16-xml-parser` — Anthropic `<invoke>` parser support + Levenshtein-based unknown-tool recovery hints (qwen3.6 drift to Claude Code-style tool names like `read_file`); xml-parser test coverage - `v1.13.16-xml-parser` — Anthropic `<invoke>` parser support + Levenshtein-based unknown-tool recovery hints (qwen3.6 drift to Claude Code-style tool names like `read_file`); xml-parser test coverage
- `v1.13.17-cross-repo-reads``request_read_access` tool + per-session `allowed_read_paths` grants; `pathGuard` extended with `extraRoots`; pause/resume reuses the `ask_user_input` mechanism
- `v1.13.18-codecontext-file-path``resolveProjectPath` in `codecontext_client.ts` realpath-resolves `file_path` arg the same way `target_dir` was; closes the silent-fail path the sidecar exhibited on relative paths
- `v1.13.19-html-artifact-panes` — pane-based artifact viewer with on-request HTML; `<!DOCTYPE html>` detection adds `message_parts.kind='html_artifact'` row; Markdown + HTML panes both open via "Open in pane" affordance; iframe sandbox `allow-scripts allow-clipboard-write allow-downloads` (no `allow-same-origin`, `srcDoc`); CSP `connect-src 'none'`. Scope-revised mid-design from auto-bias-to-HTML to Markdown-default / HTML-on-request
- `v1.13.20-drop-legacy-cols` — final strangler-fig step. Drops `messages.tool_calls` + `tool_results` columns; 10 dual-write sites removed (recon caught 2 beyond the original roadmap inventory); `messages_with_parts` view simplified to parts-only subselects via `CREATE OR REPLACE` before the column DROPs (Postgres ordering constraint). Adversarial-review catch: `discard_stale` had a `RETURNING tool_calls, tool_results` clause; fixed via two-step UPDATE-then-SELECT-from-view. `Message` API type retains the fields — view synthesizes them from parts so the wire shape is unchanged
- `v1.13`**umbrella tag on the same commit as v1.13.20.** Marks the AI SDK v6 + parts-table migration complete
The remaining strangler-fig final step (drop `messages.tool_calls` + `tool_results` columns) is still pending under its old `v1.13.2` working name; will get a new tag slug when scoped. The v1.13.x line is closed. Three batches still sit in the **In flight** column conceptually but none of them are v1.13.x scope: **live-smoke of v1.13.19** (manual browser exercise of the artifact panes — five minutes, independent), and the two v1.14 branches below. Independent siblings (`v1.14.x-mcp`, `v1.14.x-html`, `v1.16`) can ship in any order relative to v1.14 itself.
## In flight / next (v1.13.x cleanup line)
Five more single-dispatch batches before the strangler-fig closes. Each ships independently with its own smoke and rollback surface. **Do not fold.** Order is locked:
### v1.13.8 — system-prompt prefix stability verify-and-measure (REFRAMED, 2026-05-22)
**Original plan:** add a `system_prompt_cache` DB table keyed by `(agent_id, project_id, skills_version)`, mtime-invalidated.
**Why reframed:** recon disproved the premise. `apps/server/src/services/system-prompt.ts:buildSystemPrompt` already runs over mtime-cached inputs at the file layer:
- BOOCHAT.md / BOOCODER.md cached in `system-prompt.ts:25` (`cachedGuidance`, keyed by mtime)
- global + per-project AGENTS.md cached in `agents.ts:245` (`safeStat` pattern, 60s TTL)
- `session.system_prompt` / `project.default_system_prompt` are DB scalars (byte-stable until edited)
- BASE_SYSTEM_PROMPT is a hardcoded template with `${projectPath}` interpolation
Output assembly is a microsecond pure-string concat with no I/O. Skills aren't in the prefix (runtime discovery via `skill_find`). Tools live in a separate request body field, alpha-sorted by v1.13.3. **In theory the prefix is already byte-stable across turns; nothing has measured it.**
**New scope — instrumentation only, no cache:**
1. SHA-256 fingerprint of `buildSystemPrompt`'s output logged per turn at `level=info`, msg `prefix-fingerprint`, with project_id / agent_id / session_id / prefix_hash / prefix_length / mtime fields.
2. Module-level `Map<sessionId, lastHash>` observer. On hash change for a known session → emit `prefix-drift` at `level=warn` with `prev_hash`, `new_hash`, and a field-level `changed_inputs` diff.
3. Unit-level byte-stability assertion in `system-prompt.test.ts`: two consecutive `buildSystemPrompt` calls with the same inputs return byte-identical strings.
**Decision criterion:** smoke 5 turns in a fresh session. 5 identical hashes + zero drift logs → close v1.13.8 as no-op, **drop the DB cache plan permanently**, move to v1.13.9. If drift surfaces → characterize the failure mode in a follow-up batch (the answer may not be a cache at all).
**Doctrine:** matches the v1.13.6 audit pattern. Don't add infrastructure without a proven cache miss. The v1.12.0 mtime caches at the input layer plus alpha tool ordering at the request body layer already address the load-bearing cache-stability surfaces.
**Dispatch brief:** `handoff_v1.13.8_prefix_verify.md`.
**Estimated:** ~95 LoC (system-prompt.ts + small `getAgentsMtimes` accessor in agents.ts + 3 new tests).
### v1.13.9 — compaction overflow trigger formula
opencode pattern: `0.85 * ctx_max` early trigger (not at 100% saturation). Reduces tail-loss risk and gives compaction a safer window. Tiny change but tied to v1.13.4's tier logic — sequence matters.
**Lift source:** `anomalyco/opencode` `session/overflow.ts`.
**Estimated:** ~30 LoC.
### v1.13.10 — per-tool token cost accounting
Rolling average per tool, surfaced in AgentPicker tooltip + agent-pick decisions. Backend tracks `(tool_name, prompt_tokens_in, completion_tokens_out)` per call; surfaces a 100-call rolling mean. Frontend reads it for tool-cost hints. **Depends on v1.13.7's `includeUsage` fix** — without real token numbers in DB rows, the rolling average is empty.
**Estimated:** ~250 LoC.
### v1.13.11 — WebSocket frame typing
Zod schemas validated both ends. Catches the recurring class of bug that drove the 2026-05-21 debugging spike (silent protocol drift). Upfront work that pays back every time the protocol changes. `chat_status`, `usage`, `parts_appended`, `session_workspace_updated`, `tool_running` — every frame gets a Zod schema, every send/receive site validates.
**Estimated:** ~300 LoC.
### v1.13.12 — skills audit pass (NEW, 2026-05-22)
**Goal:** apply the rules→recipes split (per Codeminer42 activation-gap data: plain skills invoke 6% in clean multi-turn, `CLAUDE.md`/`AGENTS.md` is 100% present) to BooCode's 7 vendored v1.12 skills. Sort each into: (a) move to `AGENTS.md` as always-true rule, (b) keep as recipe invoked via `/skill <name>`, (c) move bulky context into `references/` flat subdirectory inside the skill, (d) delete (Claude already does it reliably).
**Scope:**
1. **Audit each of the 7 vendored skills against the 4-way split.** Most workflow-rule content ("always do X before Y", "never do Z") moves to `AGENTS.md` since it should be 100% present. Recipe content ("here's how to scaffold a component", "here's the release checklist") stays as skill, gets `context: fork` if heavy.
1. **Adopt Anthropic best-practices conventions** for any skills that remain after audit: gerund names (`scaffolding-components`, not `component-helper`), SKILL.md ≤500 lines, references one level deep, third-person imperative voice, MCP tool references in `ServerName:tool_name` format, no Windows-style paths, no time-sensitive info, consistent terminology, no "voodoo constants."
1. **Run each remaining skill through the 4-step validation protocol** from `mgechev/skills-best-practices` (Discovery → Logic → Edge Case → Architecture Refinement) using a fresh Claude chat per step. Prompts are paste-ready; ~10 minutes per skill.
1. **Install `skillgrade` on Sam's host** (`npm i -g skillgrade`). For each remaining skill, write a minimal `eval.yaml` with 23 tasks and run `skillgrade --smoke` (5 trials, ~5 min) to confirm the skill triggers when expected and produces correct output. **Likely outcome: some skills show 020% trigger rate — confirms they belong in AGENTS.md, not as skills.**
1. **Document the rules→recipes split as a BooCode convention** in `BOOCODER.md` / `BOOCHAT.md`. Future-proofs against re-adding workflow rules as skills.
**Lift sources:**
- `blog.codeminer42.com/stop-putting-best-practices-in-skills/` — empirical 6%/33%/66%/100% invocation-rate data with Vercel-style multi-turn methodology. The activation-gap framing.
- `mgechev/skills-best-practices` (25 stars, MIT) — 4-step validation protocol with paste-ready prompts. Directory structure conventions.
- `mgechev/skillgrade` (132 stars, MIT) — agent-agnostic skill eval framework. `eval.yaml` task+grader schema. Smoke/reliable/regression presets.
- `platform.claude.com/docs/en/agents-and-tools/agent-skills/best-practices` — canonical Anthropic standard. 500-line ceiling, gerund naming, progressive disclosure patterns, MCP tool reference format, verification checklist.
**Dependencies:** none (the 7 v1.12 skills already exist; this is an audit pass on shipped material). Can ship at any point in the v1.13.x line.
**Estimated:** zero code changes, ~one evening of audit work, plus skillgrade install. Per-skill eval.yaml authoring is ~30 min per skill including the 4-step validation. Total roughly 56 hours of focused work for all 7 skills.
### v1.13.2 — drop legacy columns (final phase of strangler-fig)
**Wait at least one week of production traffic on v1.13.1 before shipping.** The dual-write is rollback insurance. Drop the columns and that rollback is gone.
**Verification query before shipping:**
```sql
SELECT
COUNT(*) FILTER (WHERE m.tool_calls IS NOT NULL AND NOT EXISTS (
SELECT 1 FROM message_parts p WHERE p.message_id = m.id AND p.kind = 'tool_call'
)) AS missing_tool_call_parts,
COUNT(*) FILTER (WHERE m.tool_results IS NOT NULL AND NOT EXISTS (
SELECT 1 FROM message_parts p WHERE p.message_id = m.id AND p.kind = 'tool_result'
)) AS missing_tool_result_parts
FROM messages m
WHERE m.created_at > '2026-05-22'::timestamptz;
```
Both columns must read 0.
**Scope (~150 LoC, mostly deletions):**
1. Remove dual-write from every v1.13.0 site: `tool-phase.ts` (3 sites), `finalizeCompletion`, `skills.ts` (2 sites), `messages.ts` answer flow, `chats.ts` (fork). Keep only the parts write.
1. Simplify `messages_with_parts` view — drop COALESCE fallbacks since legacy columns are about to disappear.
1. `ALTER TABLE messages DROP COLUMN tool_calls, DROP COLUMN tool_results`.
1. Remove `tool_calls`/`tool_results` fields from `Message` API type. API boundary unchanged (frontend already reads parts-derived values).
1. Drop the stale `messages_status_check` cleanup DO block from v1.12.1 schema if still present.
1. Update test fixtures in `inference.test.ts` and `compaction.test.ts` to construct parts instead of inline `tool_calls: null, tool_results: null` literals. ~30 fixture rewrites.
After v1.13.2 ships, tag the umbrella `v1.13` on the same commit (or on -C — Sam's call).
**Shipped as `v1.13.20-drop-legacy-cols` on 2026-05-23 with umbrella `v1.13` tagged on the same commit.** Slug renamed at ship time (the "v1.13.2" planning name predated the patch-monotonic-per-minor convention). Calendar wait dropped — single-user self-hosted, no production rollback constraint. Recon caught 2 additional dual-write sites beyond the roadmap's 8 (chats.ts fork-clone + extras in tool-phase.ts) and an additional fixture file (`tool_cost_stats.test.ts`) with a direct legacy-column INSERT. Adversarial review caught a `RETURNING tool_calls, tool_results` clause in the `discard_stale` endpoint that the green test suite missed — fixed by two-step UPDATE-then-SELECT-from-view so the parts-synthesized fields keep flowing on the response. Type-pruning step on `Message.tool_calls` / `Message.tool_results` skipped (the view still populates them from parts; preserving the API contract was simpler than ripping it).
----- -----
@@ -510,8 +408,12 @@ term.indifferentketchup.com → booterm :9501 (or routed under code.
- **v1.13.12-ws-schemas:** none (Zod schemas + wrappers in TS, no DB) - **v1.13.12-ws-schemas:** none (Zod schemas + wrappers in TS, no DB)
- **v1.13.13-ws-publish:** none (publish-site conversion + protocol-drift fix in `compaction.ts`, no DB) - **v1.13.13-ws-publish:** none (publish-site conversion + protocol-drift fix in `compaction.ts`, no DB)
- **v1.13.14-skills-audit:** none (skills + AGENTS.md migration into git via `.gitignore` negation patterns; no DB) - **v1.13.14-skills-audit:** none (skills + AGENTS.md migration into git via `.gitignore` negation patterns; no DB)
- **v1.13.15-codecontext-synth (this batch, tag pending):** `message_parts.kind` CHECK constraint extended with `'synthesis'` value (DROP + DO $$ pg_constraint idempotency-guarded re-add) - **v1.13.15-codecontext-synth:** `message_parts.kind` CHECK constraint extended with `'synthesis'` value (DROP + DO $$ pg_constraint idempotency-guarded re-add)
- **(column drop, pending — old working name v1.13.2):** drop `messages.tool_calls`, `messages.tool_results`; simplify `messages_with_parts` view - **v1.13.16-xml-parser:** none (parser change + new `tool-suggestions.ts` helper in TS, no DB)
- **v1.13.17-cross-repo-reads:** `sessions.allowed_read_paths text[] NOT NULL DEFAULT ARRAY[]::text[]` (per-session cross-repo read grants)
- **v1.13.18-codecontext-file-path:** none (path resolver in `codecontext_client.ts`, no DB)
- **v1.13.19-html-artifact-panes:** `message_parts.kind` CHECK constraint extended with `'html_artifact'` value (same v1.13.15 pattern)
- **v1.13.20-drop-legacy-cols:** `ALTER TABLE messages DROP COLUMN tool_calls, DROP COLUMN tool_results` (the strangler-fig's final phase). `messages_with_parts` view rewritten to parts-only subselects via `CREATE OR REPLACE VIEW` BEFORE the drops (Postgres ordering constraint). v1.12.1 `messages_status_check`/`messages_role_check` cleanup block removed (one-shot effective long ago)
- **v1.14:** `agents.steps` column (or AGENTS.md parser extension; no DB if file-only) - **v1.14:** `agents.steps` column (or AGENTS.md parser extension; no DB if file-only)
- **v1.14.x-mcp (NEW):** none — single-server MCP-client PoC is config-only at first, no schema change - **v1.14.x-mcp (NEW):** none — single-server MCP-client PoC is config-only at first, no schema change
- **v1.14.x-html (NEW):** `message_parts.kind` CHECK constraint extended with `'html_artifact'` value - **v1.14.x-html (NEW):** `message_parts.kind` CHECK constraint extended with `'html_artifact'` value
@@ -621,7 +523,17 @@ Earlier May 18 chat recommended Option A (thin orchestration shell over OpenCode
### v1.13.x cleanup line locked (2026-05-22) ### v1.13.x cleanup line locked (2026-05-22)
After the 2026-05-22 retag, the v1.13.x cleanup line in `vMAJOR.MINOR.PATCH-slug` form is **v1.13.0-ai-sdk-v6 ✅ → v1.13.1-cleanup-bundle ✅ → v1.13.2-compaction-prune ✅ → v1.13.3-truncate ✅ → v1.13.4-reasoning-fix ✅ → v1.13.5-stability-bundle ✅ → v1.13.6-prefix-stability ✅ → v1.13.7-compaction-trigger ✅ → v1.13.8-tool-cost ✅ → v1.13.9-agentlint ✅ → v1.13.10-openspec ✅ → v1.13.11-tools ✅ → v1.13.12-ws-schemas ✅ → v1.13.13-ws-publish ✅ → v1.13.14-skills-audit ✅ → v1.13.15-codecontext-synth ✅ → v1.13.16-xml-parser ✅ → column drop (final, pending — old working name v1.13.2)**. **Do not fold.** Smoke isolation matters: each batch has a distinct rollback surface, and bisecting a 750-LoC merge across four unrelated changes is worse than four separate dispatches. The v1.13.x cleanup line shipped 21 batches over a single intense window in `vMAJOR.MINOR.PATCH-slug` form: **v1.13.0-ai-sdk-v6 ✅ → v1.13.1-cleanup-bundle ✅ → v1.13.2-compaction-prune ✅ → v1.13.3-truncate ✅ → v1.13.4-reasoning-fix ✅ → v1.13.5-stability-bundle ✅ → v1.13.6-prefix-stability ✅ → v1.13.7-compaction-trigger ✅ → v1.13.8-tool-cost ✅ → v1.13.9-agentlint ✅ → v1.13.10-openspec ✅ → v1.13.11-tools ✅ → v1.13.12-ws-schemas ✅ → v1.13.13-ws-publish ✅ → v1.13.14-skills-audit ✅ → v1.13.15-codecontext-synth ✅ → v1.13.16-xml-parser ✅ → v1.13.17-cross-repo-reads ✅ → v1.13.18-codecontext-file-path ✅ → v1.13.19-html-artifact-panes ✅ → v1.13.20-drop-legacy-cols ✅** → umbrella `v1.13`. **Do not fold** was the discipline — each batch has a distinct rollback surface, and bisecting a 750-LoC merge across four unrelated changes is worse than four separate dispatches. Held throughout; CHANGELOG.md is the per-tag canonical record.
### Numbering and scope-revision discipline during v1.13.x (2026-05-23)
The v1.13.x line ran 21 batches; planned-vs-shipped numbering diverged for half of them, and three batches had material scope revisions mid-design. Pattern that emerged and is worth carrying forward:
- **Patch numbers are assigned at ship time, not in planning.** The proposal/openspec folder uses a planning slug (e.g. `v1.14.x-html-artifact-panes`); the final tag uses a concrete patch monotonic-per-minor (e.g. `v1.13.19-html-artifact-panes`). Avoids the "we said v1.13.8 but actually shipped seventh" confusion that ate two retrospective passes on the roadmap.
- **Scope-revise the proposal before dispatching.** v1.13.19-html-artifact-panes flipped mid-design from "auto-bias to HTML for >100 lines" to "Markdown default, HTML on request" — the proposal got rewritten before recon. Far cheaper than discovering the wrong approach in implementation. The "brainstorm before code" discipline.
- **Recon-first dispatch finds 2530% more sites than the roadmap inventory.** v1.13.20 recon caught 2 extra dual-write sites (chats.ts fork-clone + 2 in tool-phase.ts) and an extra fixture file. v1.13.19 recon corrected which `Pane` type to extend. Skipping recon to save a step doesn't.
- **Adversarial reviews catch what test suites miss.** v1.13.19 reviewer caught silent error-promotion in `openInPane`; v1.13.20 reviewer caught a `RETURNING tool_calls, tool_results` clause that crashes in production but slips past green tests. Both are routine code-reviewer dispatches; both saved a same-day hotfix. **Two-stage review (spec then quality) is non-negotiable when shipping fast.**
- **Calendar-gated waits are production-safety hedges that don't apply here.** v1.13.20 originally said "wait one week of production traffic on v1.13.1 before dropping columns." Sam called it out: single-user self-hosted, no rollback constraint, code-level audit + DB COUNT query is the actual safety check. Dropped the wait. Don't ritualize production-grade hedges in a single-user codebase.
### v1.13 retrospective (what shipped) ### v1.13 retrospective (what shipped)
@@ -634,7 +546,21 @@ After the 2026-05-22 retag, the v1.13.x cleanup line in `vMAJOR.MINOR.PATCH-slug
- **v1.13.5** — opencode truncate.ts port + view_truncated_output tool. Tagged on `f8fc5db`. - **v1.13.5** — opencode truncate.ts port + view_truncated_output tool. Tagged on `f8fc5db`.
- **v1.13.6** — compaction head-assembly audit + reasoning fix. Closed the Q3 reasoning gap from v1.13.1-C. Tagged on `81d837c`. - **v1.13.6** — compaction head-assembly audit + reasoning fix. Closed the Q3 reasoning gap from v1.13.1-C. Tagged on `81d837c`.
- **v1.13.7** — stability bundle: includeUsage fix + trim guards + payload filter + budget bump. Surfaces tokens (closes a v1.13.1-A latent regression where `result.usage` resolved empty), kills the empty-bubble + ActionRow noise between tool calls on single-tool-call turns, and unblocks Continue after cap-hit on chats that have trailing empty/failed assistants. - **v1.13.7** — stability bundle: includeUsage fix + trim guards + payload filter + budget bump. Surfaces tokens (closes a v1.13.1-A latent regression where `result.usage` resolved empty), kills the empty-bubble + ActionRow noise between tool calls on single-tool-call turns, and unblocks Continue after cap-hit on chats that have trailing empty/failed assistants.
- **v1.13.2 deferred** — at least one week of production traffic on v1.13.1 before dropping legacy columns. Dual-write is rollback insurance. - **v1.13.6 (numbering re-aligned)** — system-prompt prefix verify-and-measure batch (originally numbered v1.13.8 in the planning doc). Reframed mid-design from "add a `system_prompt_cache` table" to "instrument-and-prove" after recon showed input-layer mtime caches already achieve byte-stable prefixes. Smoke confirmed zero drift across 5 turns; dropped the planned DB table.
- **v1.13.7-compaction-trigger** — 0.85×ctx_max early trigger (planned as v1.13.8 / v1.13.9).
- **v1.13.8-tool-cost** — `tool_cost_stats` SQL view + AgentPicker tooltip surfacing (planned as v1.13.9 / v1.13.10).
- **v1.13.9-agentlint** — instruction-file AgentLint pass (planned as part of v1.13.11 skills audit; split into its own batch when it grew larger than fitting).
- **v1.13.10-openspec** — `openspec/changes/<slug>/{proposal,tasks,design}.md` batch-doc structure adoption.
- **v1.13.11-tools** — tiered tool loading via `BOOCODE_TOOLS=core|standard|all` env (~30 LoC; was a far-future optional item, slotted in).
- **v1.13.12-ws-schemas** + **v1.13.13-ws-publish** — Zod schemas for all 27 wire-format frames, `publishFrame`/`publishUserFrame` wrappers, ~80 publish sites converted (planned as v1.13.10 / v1.13.11).
- **v1.13.14-skills-audit** — 26 skills vendored + audited via 5 parallel agent teams; 14 kept, 11 dropped, 1 migrated to BOOCHAT.md/BOOCODER.md. Codeminer42 rules-vs-recipes framing applied.
- **v1.13.15-codecontext-synth** — forced second-inference synthesis pass for codecontext overview tools (truncation-aware extraction; auto-fetched top-N files + project docs under 32k payload budget).
- **v1.13.16-xml-parser** — Anthropic `<invoke>` parser support + Levenshtein unknown-tool recovery hints (qwen3.6 drift to Claude Code-style tool names).
- **v1.13.17-cross-repo-reads** — `request_read_access` tool + per-session `allowed_read_paths` grants; `pathGuard` extraRoots; reuses the `ask_user_input` pause/resume mechanism.
- **v1.13.18-codecontext-file-path** — `resolveProjectPath` in `codecontext_client.ts` realpath-resolves `file_path` the same way `target_dir` was already resolved.
- **v1.13.19-html-artifact-panes** — pane-based artifact viewer (Markdown default + HTML on request). Scope-revised mid-design from auto-bias-HTML to Markdown-default. `<!DOCTYPE html>` detection adds `message_parts.kind='html_artifact'` row; iframe sandbox `allow-scripts allow-clipboard-write allow-downloads` (no `allow-same-origin`); CSP `connect-src 'none'` + `X-Content-Type-Options: nosniff` + `Content-Security-Policy: sandbox` defense-in-depth. Pane state is reference-only — content fetched on mount to keep jsonb small.
- **v1.13.20-drop-legacy-cols** — final strangler-fig step. 10 dual-write sites stripped (recon caught 2 beyond the original v1.13.2 inventory). `messages_with_parts` simplified to parts-only via `CREATE OR REPLACE` before column DROPs (Postgres ordering constraint). Adversarial-review catch: `discard_stale` had `RETURNING tool_calls, tool_results` — fixed via two-step UPDATE-then-SELECT-from-view. `Message` type retains the fields, populated by the view. v1.12.1 cleanup DO block removed.
- **`v1.13` umbrella** — tagged on the same commit as v1.13.20 (`211e903`). AI SDK v6 + parts-table migration complete.
### Pre-v1.13 architectural decisions (still load-bearing) ### Pre-v1.13 architectural decisions (still load-bearing)

View File

@@ -59,6 +59,7 @@ Rules:
## Refactorer ## Refactorer
--- ---
temperature: 0.3 temperature: 0.3
steps: 5
tools: [find_files, get_codebase_overview, get_dependencies, get_file_analysis, get_framework_analysis, get_semantic_neighborhoods, get_symbol_info, grep, list_dir, search_symbols, view_file, watch_changes] tools: [find_files, get_codebase_overview, get_dependencies, get_file_analysis, get_framework_analysis, get_semantic_neighborhoods, get_symbol_info, grep, list_dir, search_symbols, view_file, watch_changes]
description: Proposes refactors for clarity, deduplication, or decoupling. Read-only — outputs plans, not edits. description: Proposes refactors for clarity, deduplication, or decoupling. Read-only — outputs plans, not edits.
--- ---
@@ -97,6 +98,7 @@ Codecontext usage:
## Architect ## Architect
--- ---
temperature: 0.5 temperature: 0.5
steps: 20
tools: [find_files, get_codebase_overview, get_dependencies, get_file_analysis, get_framework_analysis, get_semantic_neighborhoods, get_symbol_info, grep, list_dir, search_symbols, view_file, watch_changes] tools: [find_files, get_codebase_overview, get_dependencies, get_file_analysis, get_framework_analysis, get_semantic_neighborhoods, get_symbol_info, grep, list_dir, search_symbols, view_file, watch_changes]
description: Designs new features, modules, or architectural changes. Outputs a build plan. description: Designs new features, modules, or architectural changes. Outputs a build plan.
--- ---

View File

@@ -0,0 +1,72 @@
# v1.14.0-outer-loop — design decisions
Answers to the dispatch's blocking questions, resolved 2026-05-23.
## D1. Step cap — what replaces MAX_TOOL_LOOP_DEPTH?
`MAX_TOOL_LOOP_DEPTH` never existed — no hard recursion depth guard was ever in the codebase. Safety came from budget (50 tool calls) + doom-loop (3 identical calls).
**Decision:** introduce `MAX_STEPS = 200` as a hard ceiling. Per-agent cap via `agent.steps` is the primary knob. Resolution: `effectiveCap = Math.min(agent.steps ?? Infinity, MAX_STEPS)`.
**Rationale:** Sam reports BooChat gets stuck at 50 tool calls (the budget) too often. The step cap should be generous — 200 is 4x the current de-facto ceiling. Budget (50 tool calls total across all steps) remains a separate concern and is not changed in this batch.
Note: "step" ≠ "tool call." One step = one stream iteration that may produce multiple parallel tool calls. Budget counts individual tool calls; step cap counts iterations. At 200 steps with average 1-2 tool calls per step, the budget (50) will fire well before the step cap in most scenarios. The step cap is a safety ceiling for cases where the model makes many 1-tool-call iterations.
## D2. step_finish — emit or not?
**Decision:** No `step_finish` part. The next `step_start` (or assistant message completion) implicitly ends the previous step.
**Rationale:** opencode only emits `step_start`. Less noise in parts, simpler code. If UI ever needs step durations, compute from the timestamps of consecutive `step_start` parts.
## D3. Step-cap hit — sentinel or quiet?
**Decision:** Write a sentinel summary on step-cap hit. Visible to the user in chat, same as budget-exhaustion's `runCapHitSummary`.
**Implementation:** Extend `runCapHitSummary` to accept a `reason: 'budget' | 'step_cap'` parameter (or add a parallel `runStepCapSummary`). The sentinel metadata kind stays `cap_hit` — frontend `CapHitSentinel` component already renders it. The sentinel's text distinguishes the two cases ("Tool budget exhausted" vs "Step limit reached").
## D4. agent.steps = 0
**Decision:** `steps: 0` means "no tool calls allowed." The loop body never executes. The assistant can only respond with text.
**Implementation:** When `effectiveCap === 0`, skip the loop entirely. Stream the first assistant turn (text-only), finalize, return. The model receives no tools in the request payload when `steps: 0` (or equivalently, tools are passed but the loop never enters the tool-execution branch).
Actually, cleaner: `steps: 0` means the loop cap is 0. The while condition `stepNumber < effectiveCap` is false on the first check. The stream phase still runs (the model produces a text response), but if it emits tool calls they're ignored and the turn finalizes as text-only. This may produce a confusing response if the model's text references tool results it never got — but `steps: 0` is an explicit constraint the agent author chose. Document in AGENTS.md parser validation.
## D5. Synthesis success terminates the loop?
**Decision:** Yes. `break` out of the loop after synthesis success. Preserves current behavior (synthesis replaces the recursive call; no further iterations).
**Rationale:** The synthesis pass produces a self-contained summary turn. Continuing the loop after synthesis would let the model issue more tool calls on top of a synthesis summary, which is semantically wrong — the synthesis IS the final answer for that tool call batch.
## D6. executeToolPhase return struct
The recursive call at `tool-phase.ts:342` is currently the last thing `executeToolPhase` does (after creating the next assistant row). After the conversion, `executeToolPhase` returns a struct the loop body reads:
```typescript
interface ToolPhaseResult {
action: 'continue' | 'paused' | 'synthesis_done';
toolCallCount: number;
toolCalls: ToolCall[];
nextAssistantId: string | null;
}
```
- `continue` → loop continues; `nextAssistantId` is the new assistant message's UUID.
- `paused` → user-input or grant pause; loop breaks. `nextAssistantId` is null.
- `synthesis_done` → synthesis succeeded; loop breaks. `nextAssistantId` is null (synthesis wrote its own parts).
The loop body then:
1. Updates `toolsUsed += result.toolCallCount`
2. Appends `result.toolCalls` to `recentToolCalls`
3. Sets `assistantMessageId = result.nextAssistantId` for the next iteration
4. Increments `stepNumber`
5. Checks `result.action` — if not `continue`, breaks.
## D7. Budget vs steps interaction
Budget counts **individual tool calls** across the entire turn. Steps counts **loop iterations**. They are orthogonal:
- Budget fires when `toolsUsed >= resolveToolBudget(agent)` (currently 50 for read-only). Checked at the top of each iteration.
- Step cap fires when `stepNumber >= effectiveCap`. Checked by the loop condition.
Both produce a sentinel summary. A turn can be terminated by whichever fires first. In practice, budget (50 tool calls) fires before step cap (200 steps) unless the model produces many 0-tool-call iterations (which shouldn't happen — 0 tool calls means non-tool finish, which exits the loop via the `break` path).

View File

@@ -0,0 +1,112 @@
# v1.14.0-outer-loop — explicit outer agent loop
Replace the ad-hoc `executeToolPhase → runAssistantTurn` recursion with an explicit `while` loop. A **step** is one stream-and-tool-execute iteration; a step can contain multiple parallel tool calls. The loop terminates on non-tool finish OR step-cap hit OR doom-loop OR budget exhaustion OR abort OR synthesis success.
## Why
The current recursion works but has two problems: (a) stack depth grows linearly with tool iterations — 50 nested async frames is fragile, (b) there's no explicit step counter, so there's no per-agent step cap and no step-boundary instrumentation. BooChat also gets stuck at 50 tool calls (the budget ceiling) more often than it should — the new `MAX_STEPS = 200` hard ceiling lets the loop run much longer before the step cap fires, while the existing budget (50 tool calls) remains a separate concern.
## Recon findings (verified 2026-05-23)
- `runAssistantTurn` at `turn.ts:144-147` is the recursive entry. Returns `Promise<void>`.
- `executeToolPhase` at `tool-phase.ts:89-96` calls back into `runAssistantTurn` at `tool-phase.ts:342`.
- Recursion terminates on: non-tool finish, budget exhaustion (`args.toolsUsed >= budget`), doom-loop (3 identical calls via `detectDoomLoop`), user-input pause (ask_user_input / request_read_access), synthesis success, stream error, abort.
- **No existing hard recursion depth limit** — `MAX_TOOL_LOOP_DEPTH` does not exist. Safety comes from budget (50) + doom-loop (3 identical).
- `TurnArgs` defined in `turn.ts:127-141`, not `types.ts`. Fields: `sessionId`, `chatId`, `assistantMessageId`, `toolsUsed`, `recentToolCalls`, `signal`. All mutable fields are threaded through the recursive call.
- Synthesis pipeline (`synthesisPipeline.ts`) is a branch in `executeToolPhase` — if synthesis succeeds, recursion is skipped.
- `step_start` already in the `message_parts.kind` CHECK constraint. No schema change needed.
- `agents.ts` does NOT currently parse a `steps` field. Needs adding to `ParsedFrontmatter`.
## Scope
### S1. Outer loop in `turn.ts`
Convert the recursive chain to a `while (stepNumber < effectiveCap)` loop:
```
let stepNumber = 0
while (stepNumber < effectiveCap) {
// doom-loop check
// budget check
// emit step_start part
// stream phase (executeStreamPhase)
// if no tool calls → finalize, break
// tool phase (executeToolPhase — now returns, doesn't recurse)
// if paused (user input / grant) → break
// if synthesis succeeded → break
// create next assistant message row
// increment stepNumber, update toolsUsed, append recentToolCalls
}
// if stepNumber >= effectiveCap → sentinel summary
```
`effectiveCap = Math.min(agent.steps ?? Infinity, MAX_STEPS)` where `MAX_STEPS = 200`.
### S2. `executeToolPhase` becomes non-recursive
Remove the `runAssistantTurn` call at `tool-phase.ts:342`. Instead, return a result indicating what happened: `{action: 'continue' | 'paused' | 'synthesis_done', toolsUsed, recentToolCalls, nextAssistantId}`. The caller (the while loop) uses the action to decide whether to continue or break.
### S3. `agent.steps` field
`agents.ts:ParsedFrontmatter` gains `steps?: number`. Parser extracts it from YAML frontmatter (integer ≥ 0). `steps: 0` means "no tool calls allowed" — loop body never executes; assistant responds text-only.
### S4. Step-boundary events
At the top of each loop iteration, emit a `step_start` part with payload `{step_number, started_at}`. Uses `insertParts` into the current assistant message. No `step_finish` — the next `step_start` (or message completion) implicitly ends the previous step.
### S5. Doom-loop migration
`detectDoomLoop` check moves from `runAssistantTurn` (top of function, pre-stream) to the top of the while-loop body (same logical position). Same predicate, same threshold (3). Same `runDoomLoopSummary` call. Control flow changes from `return` (unwinding recursion) to `break` (exiting loop).
### S6. Step-cap sentinel
When `stepNumber >= effectiveCap`, write a sentinel summary like the existing `runCapHitSummary`. Reuse `runCapHitSummary` with a reason parameter distinguishing "budget exhaustion" from "step cap hit", or create a parallel `runStepCapSummary`. The sentinel makes the cap visible in chat.
### S7. AGENTS.md updates
Add `steps:` to each agent in `data/AGENTS.md`:
- Refactorer: `steps: 5`
- Architect: `steps: 20`
- All others: unset (infinity — bounded only by `MAX_STEPS = 200`)
### S8. Tests
New test file `apps/server/src/services/__tests__/outer-loop.test.ts` covering:
- Clean finish (stream returns non-tool, loop exits after 1 iteration)
- Step-cap hit (loop exits at cap, sentinel written)
- Doom-loop break (3 identical calls, sentinel written)
- Budget exhaustion (toolsUsed >= budget, cap-hit sentinel written)
- Abort mid-step (signal fires, loop exits)
- `steps: 0` edge case (no loop iterations, text-only response)
- Synthesis success (loop exits after synthesis)
## Non-goals
- No frontend changes. `step_start` parts surface via `messages_with_parts` automatically; UI doesn't render them in v1.14.
- No `output_schema` / `exit_expression` / `execution_strategy` AGENTS.md fields.
- No per-step snapshot for revert (v2.0 BooCoder concern).
- No changes to budget constants (50 / 10 / 50). That's a separate concern.
- No `repairToolCall` changes.
- No compaction changes.
## Hard rules
- No git commit, push. Sam commits.
- Backup before editing.
- TS strict, no `any`.
- Doom-loop threshold stays at 3.
- 332+ existing tests still pass + new outer-loop tests.
## Files expected to touch
- `apps/server/src/services/inference/turn.ts` — recursion → loop
- `apps/server/src/services/inference/tool-phase.ts` — remove recursive call, return result struct
- `apps/server/src/services/inference/sentinel-summaries.ts` — step-cap sentinel (or extend cap-hit)
- `apps/server/src/services/agents.ts` — parse `steps` field
- `data/AGENTS.md` — add `steps:` to Refactorer + Architect
- `apps/server/src/services/__tests__/outer-loop.test.ts` — NEW
- `apps/server/src/services/inference/index.ts` — re-export if new types needed
## Estimate
~300 LoC net (turn.ts refactor + tool-phase return struct + agents parser + tests). The conversion is structural, not behavioral — every exit path is preserved, just expressed as loop control flow instead of recursion unwinding.

View File

@@ -0,0 +1,82 @@
# v1.14.0-outer-loop tasks
## B1 — Backups
- [ ] `turn.ts`, `tool-phase.ts`, `sentinel-summaries.ts`, `agents.ts`, `data/AGENTS.md`
## B2 — agents.ts: parse `steps` field
- [ ] Add `steps?: number` to `ParsedFrontmatter` interface
- [ ] Parse from YAML frontmatter: integer ≥ 0, warn on out-of-range (negative or non-integer), clamp to 0
- [ ] Expose on the `Agent` type returned by `getAgentsForProject`
- [ ] `npx tsc --noEmit -p apps/server` clean
## B3 — AGENTS.md: add `steps:` to Refactorer + Architect
- [ ] `data/AGENTS.md` — Refactorer: `steps: 5`
- [ ] `data/AGENTS.md` — Architect: `steps: 20`
- [ ] All others: leave unset (infinite, bounded by MAX_STEPS=200)
## B4 — tool-phase.ts: remove recursive call, return result struct
- [ ] Define `ToolPhaseResult` interface: `{action: 'continue' | 'paused' | 'synthesis_done', toolCallCount: number, toolCalls: ToolCall[], nextAssistantId: string | null}`
- [ ] Remove `runAssistantTurn` import and call at line ~342
- [ ] `executeToolPhase` returns `ToolPhaseResult` instead of `Promise<void>`
- [ ] On normal path (after creating next assistant row): return `{action: 'continue', toolCallCount, toolCalls: result.toolCalls, nextAssistantId}`
- [ ] On user-input pause: return `{action: 'paused', toolCallCount: <calls executed so far>, toolCalls: result.toolCalls, nextAssistantId: null}`
- [ ] On synthesis success: return `{action: 'synthesis_done', toolCallCount, toolCalls: result.toolCalls, nextAssistantId: null}`
- [ ] `npx tsc --noEmit -p apps/server` will FAIL here (turn.ts still expects void) — expected, fixed in B5
## B5 — turn.ts: recursion → while loop
- [ ] Add `MAX_STEPS = 200` constant
- [ ] Resolve `effectiveCap = Math.min(agent?.steps ?? Infinity, MAX_STEPS)` at the top of `runAssistantTurn`
- [ ] Convert `runAssistantTurn` body into a `while (stepNumber < effectiveCap)` loop:
- Top of loop: doom-loop check (move from current position; `break` instead of `return`)
- Top of loop: budget check (move from current position; `break` instead of `return`, but still call `runCapHitSummary` before break)
- Emit `step_start` part via `insertParts` with payload `{step_number: stepNumber, started_at: new Date().toISOString()}`
- Call `executeStreamPhase`
- If no tool calls → `finalizeCompletion`, `break`
- Call `executeToolPhase` (now returns `ToolPhaseResult`)
- If `result.action !== 'continue'``break`
- Update `toolsUsed += result.toolCallCount`
- Update `recentToolCalls = [...recentToolCalls, ...result.toolCalls]`
- Update `assistantMessageId = result.nextAssistantId!`
- Increment `stepNumber`
- [ ] After loop: if `stepNumber >= effectiveCap` → call step-cap sentinel (B6)
- [ ] `effectiveCap === 0` edge case: the while condition is immediately false; stream the first turn text-only (the stream phase at the top of the function runs once before the loop — OR handle this by structuring the loop as do-while, OR handle by pre-checking and skipping tools from the request). Pick the cleanest approach.
- [ ] Remove `TurnArgs` from the module export if it's no longer threaded through recursion — OR keep it and populate from loop locals. (Design note: `TurnArgs` is still used by `executeStreamPhase`, `executeToolPhase`, `sentinel-summaries.ts`, `error-handler.ts`. Keep the interface; populate from loop locals each iteration.)
- [ ] `npx tsc --noEmit -p apps/server` clean
- [ ] `pnpm -C apps/server test` — all existing tests pass
## B6 — sentinel-summaries.ts: step-cap sentinel
- [ ] Add `runStepCapSummary` (or extend `runCapHitSummary` with a `reason` param)
- [ ] Write a sentinel with `metadata.kind = 'cap_hit'` (same as budget) so `CapHitSentinel` UI renders it
- [ ] Sentinel text distinguishes "Step limit reached (N steps)" from "Tool budget exhausted (N calls)"
- [ ] Called from the post-loop check in turn.ts (B5)
## B7 — Tests
- [ ] NEW `apps/server/src/services/__tests__/outer-loop.test.ts`
- [ ] Test: clean finish — stream returns no tool calls, loop exits after 1 step
- [ ] Test: step-cap hit — mock agent with `steps: 2`, model always returns tool calls, loop exits at 2, sentinel written
- [ ] Test: doom-loop — 3 identical tool calls, sentinel written, loop breaks
- [ ] Test: budget exhaustion — toolsUsed >= budget, cap-hit sentinel written
- [ ] Test: `steps: 0` — no loop iterations, text-only response
- [ ] Test: synthesis success — loop breaks after synthesis
- [ ] `pnpm -C apps/server test` — all 332+ existing + new tests pass
## B8 — Verification
- [ ] `npx tsc --noEmit -p apps/server` — 0 errors
- [ ] `npx tsc -p apps/web/tsconfig.app.json --noEmit` — 0 errors (no web changes; should pass)
- [ ] `pnpm -C apps/web build` — green
- [ ] `pnpm -C apps/server test` — all green
## B9 — Docs + tag + deploy
- [ ] `CHANGELOG.md` entry for v1.14.0-outer-loop
- [ ] `boocode_roadmap.md` retrospective bullet on the v1.14 section
- [ ] `CLAUDE.md` updates: mention the outer loop, MAX_STEPS, agent.steps in the inference/ section
- [ ] Commit, tag `v1.14.0-outer-loop`, push, rebuild