Files

indifferentketchup f4a97808ad v1.14.0-outer-loop: explicit while loop replaces inference recursion

Converts the ad-hoc executeToolPhase → runAssistantTurn recursion into an
explicit while (stepNumber < effectiveCap) loop. A step is one stream-and-
tool-execute iteration; the loop terminates on non-tool finish, step-cap hit,
doom-loop, budget exhaustion, abort, or synthesis success.

MAX_STEPS = 200 hard ceiling (4x old effective limit from budget). Per-agent
steps: field in AGENTS.md frontmatter sets tighter caps (Refactorer: 5,
Architect: 20, others: unset = bounded only by MAX_STEPS). Resolution:
effectiveCap = Math.min(agent.steps ?? Infinity, MAX_STEPS).

executeToolPhase no longer recurses — returns ToolPhaseResult struct
(action: 'continue' | 'paused' | 'synthesis_done') so the caller decides
whether to continue or break. steps: 0 handled as "no tool calls allowed"
via runTextOnlyTurn (one text-only stream phase, tool calls ignored with
warn log).

Step-cap hits produce a sentinel summary (reuses cap_hit kind so
CapHitSentinel.tsx renders without frontend changes; text distinguishes
"Step limit reached" from "Tool budget exhausted"). Doom-loop check migrated
to top of loop body — same predicate, same threshold (3), break instead of
return.

step_start parts are in the schema CHECK but not emitted as message_parts —
writing before the stream phase creates a sequence-0 collision with
partsFromAssistantMessage. Structured log line emitted instead. Adversarial
review caught the collision pre-deploy.

332/332 server tests passing. No frontend changes. No schema changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-23 20:29:21 +00:00

5.0 KiB

Raw Blame History

v1.14.0-outer-loop — design decisions

Answers to the dispatch's blocking questions, resolved 2026-05-23.

D1. Step cap — what replaces MAX_TOOL_LOOP_DEPTH?

MAX_TOOL_LOOP_DEPTH never existed — no hard recursion depth guard was ever in the codebase. Safety came from budget (50 tool calls) + doom-loop (3 identical calls).

Decision: introduce MAX_STEPS = 200 as a hard ceiling. Per-agent cap via agent.steps is the primary knob. Resolution: effectiveCap = Math.min(agent.steps ?? Infinity, MAX_STEPS).

Rationale: Sam reports BooChat gets stuck at 50 tool calls (the budget) too often. The step cap should be generous — 200 is 4x the current de-facto ceiling. Budget (50 tool calls total across all steps) remains a separate concern and is not changed in this batch.

Note: "step" ≠ "tool call." One step = one stream iteration that may produce multiple parallel tool calls. Budget counts individual tool calls; step cap counts iterations. At 200 steps with average 1-2 tool calls per step, the budget (50) will fire well before the step cap in most scenarios. The step cap is a safety ceiling for cases where the model makes many 1-tool-call iterations.

D2. step_finish — emit or not?

Decision: No step_finish part. The next step_start (or assistant message completion) implicitly ends the previous step.

Rationale: opencode only emits step_start. Less noise in parts, simpler code. If UI ever needs step durations, compute from the timestamps of consecutive step_start parts.

D3. Step-cap hit — sentinel or quiet?

Decision: Write a sentinel summary on step-cap hit. Visible to the user in chat, same as budget-exhaustion's runCapHitSummary.

Implementation: Extend runCapHitSummary to accept a reason: 'budget' | 'step_cap' parameter (or add a parallel runStepCapSummary). The sentinel metadata kind stays cap_hit — frontend CapHitSentinel component already renders it. The sentinel's text distinguishes the two cases ("Tool budget exhausted" vs "Step limit reached").

D4. agent.steps = 0

Decision: steps: 0 means "no tool calls allowed." The loop body never executes. The assistant can only respond with text.

Implementation: When effectiveCap === 0, skip the loop entirely. Stream the first assistant turn (text-only), finalize, return. The model receives no tools in the request payload when steps: 0 (or equivalently, tools are passed but the loop never enters the tool-execution branch).

Actually, cleaner: steps: 0 means the loop cap is 0. The while condition stepNumber < effectiveCap is false on the first check. The stream phase still runs (the model produces a text response), but if it emits tool calls they're ignored and the turn finalizes as text-only. This may produce a confusing response if the model's text references tool results it never got — but steps: 0 is an explicit constraint the agent author chose. Document in AGENTS.md parser validation.

D5. Synthesis success terminates the loop?

Decision: Yes. break out of the loop after synthesis success. Preserves current behavior (synthesis replaces the recursive call; no further iterations).

Rationale: The synthesis pass produces a self-contained summary turn. Continuing the loop after synthesis would let the model issue more tool calls on top of a synthesis summary, which is semantically wrong — the synthesis IS the final answer for that tool call batch.

D6. executeToolPhase return struct

The recursive call at tool-phase.ts:342 is currently the last thing executeToolPhase does (after creating the next assistant row). After the conversion, executeToolPhase returns a struct the loop body reads:

interface ToolPhaseResult {
  action: 'continue' | 'paused' | 'synthesis_done';
  toolCallCount: number;
  toolCalls: ToolCall[];
  nextAssistantId: string | null;
}

continue → loop continues; nextAssistantId is the new assistant message's UUID.
paused → user-input or grant pause; loop breaks. nextAssistantId is null.
synthesis_done → synthesis succeeded; loop breaks. nextAssistantId is null (synthesis wrote its own parts).

The loop body then:

Updates toolsUsed += result.toolCallCount
Appends result.toolCalls to recentToolCalls
Sets assistantMessageId = result.nextAssistantId for the next iteration
Increments stepNumber
Checks result.action — if not continue, breaks.

D7. Budget vs steps interaction

Budget counts individual tool calls across the entire turn. Steps counts loop iterations. They are orthogonal:

Budget fires when toolsUsed >= resolveToolBudget(agent) (currently 50 for read-only). Checked at the top of each iteration.
Step cap fires when stepNumber >= effectiveCap. Checked by the loop condition.

Both produce a sentinel summary. A turn can be terminated by whichever fires first. In practice, budget (50 tool calls) fires before step cap (200 steps) unless the model produces many 0-tool-call iterations (which shouldn't happen — 0 tool calls means non-tool finish, which exits the loop via the break path).

5.0 KiB Raw Blame History