v1.14.0-outer-loop: explicit while loop replaces inference recursion
Converts the ad-hoc executeToolPhase → runAssistantTurn recursion into an explicit while (stepNumber < effectiveCap) loop. A step is one stream-and- tool-execute iteration; the loop terminates on non-tool finish, step-cap hit, doom-loop, budget exhaustion, abort, or synthesis success. MAX_STEPS = 200 hard ceiling (4x old effective limit from budget). Per-agent steps: field in AGENTS.md frontmatter sets tighter caps (Refactorer: 5, Architect: 20, others: unset = bounded only by MAX_STEPS). Resolution: effectiveCap = Math.min(agent.steps ?? Infinity, MAX_STEPS). executeToolPhase no longer recurses — returns ToolPhaseResult struct (action: 'continue' | 'paused' | 'synthesis_done') so the caller decides whether to continue or break. steps: 0 handled as "no tool calls allowed" via runTextOnlyTurn (one text-only stream phase, tool calls ignored with warn log). Step-cap hits produce a sentinel summary (reuses cap_hit kind so CapHitSentinel.tsx renders without frontend changes; text distinguishes "Step limit reached" from "Tool budget exhausted"). Doom-loop check migrated to top of loop body — same predicate, same threshold (3), break instead of return. step_start parts are in the schema CHECK but not emitted as message_parts — writing before the stream phase creates a sequence-0 collision with partsFromAssistantMessage. Structured log line emitted instead. Adversarial review caught the collision pre-deploy. 332/332 server tests passing. No frontend changes. No schema changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
72
openspec/changes/v1.14-outer-loop/design.md
Normal file
72
openspec/changes/v1.14-outer-loop/design.md
Normal file
@@ -0,0 +1,72 @@
|
||||
# v1.14.0-outer-loop — design decisions
|
||||
|
||||
Answers to the dispatch's blocking questions, resolved 2026-05-23.
|
||||
|
||||
## D1. Step cap — what replaces MAX_TOOL_LOOP_DEPTH?
|
||||
|
||||
`MAX_TOOL_LOOP_DEPTH` never existed — no hard recursion depth guard was ever in the codebase. Safety came from budget (50 tool calls) + doom-loop (3 identical calls).
|
||||
|
||||
**Decision:** introduce `MAX_STEPS = 200` as a hard ceiling. Per-agent cap via `agent.steps` is the primary knob. Resolution: `effectiveCap = Math.min(agent.steps ?? Infinity, MAX_STEPS)`.
|
||||
|
||||
**Rationale:** Sam reports BooChat gets stuck at 50 tool calls (the budget) too often. The step cap should be generous — 200 is 4x the current de-facto ceiling. Budget (50 tool calls total across all steps) remains a separate concern and is not changed in this batch.
|
||||
|
||||
Note: "step" ≠ "tool call." One step = one stream iteration that may produce multiple parallel tool calls. Budget counts individual tool calls; step cap counts iterations. At 200 steps with average 1-2 tool calls per step, the budget (50) will fire well before the step cap in most scenarios. The step cap is a safety ceiling for cases where the model makes many 1-tool-call iterations.
|
||||
|
||||
## D2. step_finish — emit or not?
|
||||
|
||||
**Decision:** No `step_finish` part. The next `step_start` (or assistant message completion) implicitly ends the previous step.
|
||||
|
||||
**Rationale:** opencode only emits `step_start`. Less noise in parts, simpler code. If UI ever needs step durations, compute from the timestamps of consecutive `step_start` parts.
|
||||
|
||||
## D3. Step-cap hit — sentinel or quiet?
|
||||
|
||||
**Decision:** Write a sentinel summary on step-cap hit. Visible to the user in chat, same as budget-exhaustion's `runCapHitSummary`.
|
||||
|
||||
**Implementation:** Extend `runCapHitSummary` to accept a `reason: 'budget' | 'step_cap'` parameter (or add a parallel `runStepCapSummary`). The sentinel metadata kind stays `cap_hit` — frontend `CapHitSentinel` component already renders it. The sentinel's text distinguishes the two cases ("Tool budget exhausted" vs "Step limit reached").
|
||||
|
||||
## D4. agent.steps = 0
|
||||
|
||||
**Decision:** `steps: 0` means "no tool calls allowed." The loop body never executes. The assistant can only respond with text.
|
||||
|
||||
**Implementation:** When `effectiveCap === 0`, skip the loop entirely. Stream the first assistant turn (text-only), finalize, return. The model receives no tools in the request payload when `steps: 0` (or equivalently, tools are passed but the loop never enters the tool-execution branch).
|
||||
|
||||
Actually, cleaner: `steps: 0` means the loop cap is 0. The while condition `stepNumber < effectiveCap` is false on the first check. The stream phase still runs (the model produces a text response), but if it emits tool calls they're ignored and the turn finalizes as text-only. This may produce a confusing response if the model's text references tool results it never got — but `steps: 0` is an explicit constraint the agent author chose. Document in AGENTS.md parser validation.
|
||||
|
||||
## D5. Synthesis success terminates the loop?
|
||||
|
||||
**Decision:** Yes. `break` out of the loop after synthesis success. Preserves current behavior (synthesis replaces the recursive call; no further iterations).
|
||||
|
||||
**Rationale:** The synthesis pass produces a self-contained summary turn. Continuing the loop after synthesis would let the model issue more tool calls on top of a synthesis summary, which is semantically wrong — the synthesis IS the final answer for that tool call batch.
|
||||
|
||||
## D6. executeToolPhase return struct
|
||||
|
||||
The recursive call at `tool-phase.ts:342` is currently the last thing `executeToolPhase` does (after creating the next assistant row). After the conversion, `executeToolPhase` returns a struct the loop body reads:
|
||||
|
||||
```typescript
|
||||
interface ToolPhaseResult {
|
||||
action: 'continue' | 'paused' | 'synthesis_done';
|
||||
toolCallCount: number;
|
||||
toolCalls: ToolCall[];
|
||||
nextAssistantId: string | null;
|
||||
}
|
||||
```
|
||||
|
||||
- `continue` → loop continues; `nextAssistantId` is the new assistant message's UUID.
|
||||
- `paused` → user-input or grant pause; loop breaks. `nextAssistantId` is null.
|
||||
- `synthesis_done` → synthesis succeeded; loop breaks. `nextAssistantId` is null (synthesis wrote its own parts).
|
||||
|
||||
The loop body then:
|
||||
1. Updates `toolsUsed += result.toolCallCount`
|
||||
2. Appends `result.toolCalls` to `recentToolCalls`
|
||||
3. Sets `assistantMessageId = result.nextAssistantId` for the next iteration
|
||||
4. Increments `stepNumber`
|
||||
5. Checks `result.action` — if not `continue`, breaks.
|
||||
|
||||
## D7. Budget vs steps interaction
|
||||
|
||||
Budget counts **individual tool calls** across the entire turn. Steps counts **loop iterations**. They are orthogonal:
|
||||
|
||||
- Budget fires when `toolsUsed >= resolveToolBudget(agent)` (currently 50 for read-only). Checked at the top of each iteration.
|
||||
- Step cap fires when `stepNumber >= effectiveCap`. Checked by the loop condition.
|
||||
|
||||
Both produce a sentinel summary. A turn can be terminated by whichever fires first. In practice, budget (50 tool calls) fires before step cap (200 steps) unless the model produces many 0-tool-call iterations (which shouldn't happen — 0 tool calls means non-tool finish, which exits the loop via the `break` path).
|
||||
Reference in New Issue
Block a user