v1.14.0-outer-loop: explicit while loop replaces inference recursion

Converts the ad-hoc executeToolPhase → runAssistantTurn recursion into an explicit while (stepNumber < effectiveCap) loop. A step is one stream-and- tool-execute iteration; the loop terminates on non-tool finish, step-cap hit, doom-loop, budget exhaustion, abort, or synthesis success. MAX_STEPS = 200 hard ceiling (4x old effective limit from budget). Per-agent steps: field in AGENTS.md frontmatter sets tighter caps (Refactorer: 5, Architect: 20, others: unset = bounded only by MAX_STEPS). Resolution: effectiveCap = Math.min(agent.steps ?? Infinity, MAX_STEPS). executeToolPhase no longer recurses — returns ToolPhaseResult struct (action: 'continue' | 'paused' | 'synthesis_done') so the caller decides whether to continue or break. steps: 0 handled as "no tool calls allowed" via runTextOnlyTurn (one text-only stream phase, tool calls ignored with warn log). Step-cap hits produce a sentinel summary (reuses cap_hit kind so CapHitSentinel.tsx renders without frontend changes; text distinguishes "Step limit reached" from "Tool budget exhausted"). Doom-loop check migrated to top of loop body — same predicate, same threshold (3), break instead of return. step_start parts are in the schema CHECK but not emitted as message_parts — writing before the stream phase creates a sequence-0 collision with partsFromAssistantMessage. Structured log line emitted instead. Adversarial review caught the collision pre-deploy. 332/332 server tests passing. No frontend changes. No schema changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 20:29:21 +00:00
parent 211e903620
commit f4a97808ad
14 changed files with 756 additions and 204 deletions
--- a/openspec/changes/v1.14-outer-loop/design.md
+++ b/openspec/changes/v1.14-outer-loop/design.md
@@ -0,0 +1,72 @@
+# v1.14.0-outer-loop — design decisions
+
+Answers to the dispatch's blocking questions, resolved 2026-05-23.
+
+## D1. Step cap — what replaces MAX_TOOL_LOOP_DEPTH?
+
+`MAX_TOOL_LOOP_DEPTH` never existed — no hard recursion depth guard was ever in the codebase. Safety came from budget (50 tool calls) + doom-loop (3 identical calls).
+
+**Decision:** introduce `MAX_STEPS = 200` as a hard ceiling. Per-agent cap via `agent.steps` is the primary knob. Resolution: `effectiveCap = Math.min(agent.steps ?? Infinity, MAX_STEPS)`.
+
+**Rationale:** Sam reports BooChat gets stuck at 50 tool calls (the budget) too often. The step cap should be generous — 200 is 4x the current de-facto ceiling. Budget (50 tool calls total across all steps) remains a separate concern and is not changed in this batch.
+
+Note: "step" ≠ "tool call." One step = one stream iteration that may produce multiple parallel tool calls. Budget counts individual tool calls; step cap counts iterations. At 200 steps with average 1-2 tool calls per step, the budget (50) will fire well before the step cap in most scenarios. The step cap is a safety ceiling for cases where the model makes many 1-tool-call iterations.
+
+## D2. step_finish — emit or not?
+
+**Decision:** No `step_finish` part. The next `step_start` (or assistant message completion) implicitly ends the previous step.
+
+**Rationale:** opencode only emits `step_start`. Less noise in parts, simpler code. If UI ever needs step durations, compute from the timestamps of consecutive `step_start` parts.
+
+## D3. Step-cap hit — sentinel or quiet?
+
+**Decision:** Write a sentinel summary on step-cap hit. Visible to the user in chat, same as budget-exhaustion's `runCapHitSummary`.
+
+**Implementation:** Extend `runCapHitSummary` to accept a `reason: 'budget' | 'step_cap'` parameter (or add a parallel `runStepCapSummary`). The sentinel metadata kind stays `cap_hit` — frontend `CapHitSentinel` component already renders it. The sentinel's text distinguishes the two cases ("Tool budget exhausted" vs "Step limit reached").
+
+## D4. agent.steps = 0
+
+**Decision:** `steps: 0` means "no tool calls allowed." The loop body never executes. The assistant can only respond with text.
+
+**Implementation:** When `effectiveCap === 0`, skip the loop entirely. Stream the first assistant turn (text-only), finalize, return. The model receives no tools in the request payload when `steps: 0` (or equivalently, tools are passed but the loop never enters the tool-execution branch).
+
+Actually, cleaner: `steps: 0` means the loop cap is 0. The while condition `stepNumber < effectiveCap` is false on the first check. The stream phase still runs (the model produces a text response), but if it emits tool calls they're ignored and the turn finalizes as text-only. This may produce a confusing response if the model's text references tool results it never got — but `steps: 0` is an explicit constraint the agent author chose. Document in AGENTS.md parser validation.
+
+## D5. Synthesis success terminates the loop?
+
+**Decision:** Yes. `break` out of the loop after synthesis success. Preserves current behavior (synthesis replaces the recursive call; no further iterations).
+
+**Rationale:** The synthesis pass produces a self-contained summary turn. Continuing the loop after synthesis would let the model issue more tool calls on top of a synthesis summary, which is semantically wrong — the synthesis IS the final answer for that tool call batch.
+
+## D6. executeToolPhase return struct
+
+The recursive call at `tool-phase.ts:342` is currently the last thing `executeToolPhase` does (after creating the next assistant row). After the conversion, `executeToolPhase` returns a struct the loop body reads:
+
+```typescript
+interface ToolPhaseResult {
+  action: 'continue' | 'paused' | 'synthesis_done';
+  toolCallCount: number;
+  toolCalls: ToolCall[];
+  nextAssistantId: string | null;
+}
+```
+
+- `continue` → loop continues; `nextAssistantId` is the new assistant message's UUID.
+- `paused` → user-input or grant pause; loop breaks. `nextAssistantId` is null.
+- `synthesis_done` → synthesis succeeded; loop breaks. `nextAssistantId` is null (synthesis wrote its own parts).
+
+The loop body then:
+1. Updates `toolsUsed += result.toolCallCount`
+2. Appends `result.toolCalls` to `recentToolCalls`
+3. Sets `assistantMessageId = result.nextAssistantId` for the next iteration
+4. Increments `stepNumber`
+5. Checks `result.action` — if not `continue`, breaks.
+
+## D7. Budget vs steps interaction
+
+Budget counts **individual tool calls** across the entire turn. Steps counts **loop iterations**. They are orthogonal:
+
+- Budget fires when `toolsUsed >= resolveToolBudget(agent)` (currently 50 for read-only). Checked at the top of each iteration.
+- Step cap fires when `stepNumber >= effectiveCap`. Checked by the loop condition.
+
+Both produce a sentinel summary. A turn can be terminated by whichever fires first. In practice, budget (50 tool calls) fires before step cap (200 steps) unless the model produces many 0-tool-call iterations (which shouldn't happen — 0 tool calls means non-tool finish, which exits the loop via the `break` path).
--- a/openspec/changes/v1.14-outer-loop/proposal.md
+++ b/openspec/changes/v1.14-outer-loop/proposal.md
@@ -0,0 +1,112 @@
+# v1.14.0-outer-loop — explicit outer agent loop
+
+Replace the ad-hoc `executeToolPhase → runAssistantTurn` recursion with an explicit `while` loop. A **step** is one stream-and-tool-execute iteration; a step can contain multiple parallel tool calls. The loop terminates on non-tool finish OR step-cap hit OR doom-loop OR budget exhaustion OR abort OR synthesis success.
+
+## Why
+
+The current recursion works but has two problems: (a) stack depth grows linearly with tool iterations — 50 nested async frames is fragile, (b) there's no explicit step counter, so there's no per-agent step cap and no step-boundary instrumentation. BooChat also gets stuck at 50 tool calls (the budget ceiling) more often than it should — the new `MAX_STEPS = 200` hard ceiling lets the loop run much longer before the step cap fires, while the existing budget (50 tool calls) remains a separate concern.
+
+## Recon findings (verified 2026-05-23)
+
+- `runAssistantTurn` at `turn.ts:144-147` is the recursive entry. Returns `Promise<void>`.
+- `executeToolPhase` at `tool-phase.ts:89-96` calls back into `runAssistantTurn` at `tool-phase.ts:342`.
+- Recursion terminates on: non-tool finish, budget exhaustion (`args.toolsUsed >= budget`), doom-loop (3 identical calls via `detectDoomLoop`), user-input pause (ask_user_input / request_read_access), synthesis success, stream error, abort.
+- **No existing hard recursion depth limit** — `MAX_TOOL_LOOP_DEPTH` does not exist. Safety comes from budget (50) + doom-loop (3 identical).
+- `TurnArgs` defined in `turn.ts:127-141`, not `types.ts`. Fields: `sessionId`, `chatId`, `assistantMessageId`, `toolsUsed`, `recentToolCalls`, `signal`. All mutable fields are threaded through the recursive call.
+- Synthesis pipeline (`synthesisPipeline.ts`) is a branch in `executeToolPhase` — if synthesis succeeds, recursion is skipped.
+- `step_start` already in the `message_parts.kind` CHECK constraint. No schema change needed.
+- `agents.ts` does NOT currently parse a `steps` field. Needs adding to `ParsedFrontmatter`.
+
+## Scope
+
+### S1. Outer loop in `turn.ts`
+
+Convert the recursive chain to a `while (stepNumber < effectiveCap)` loop:
+
+```
+let stepNumber = 0
+while (stepNumber < effectiveCap) {
+  // doom-loop check
+  // budget check
+  // emit step_start part
+  // stream phase (executeStreamPhase)
+  // if no tool calls → finalize, break
+  // tool phase (executeToolPhase — now returns, doesn't recurse)
+  // if paused (user input / grant) → break
+  // if synthesis succeeded → break
+  // create next assistant message row
+  // increment stepNumber, update toolsUsed, append recentToolCalls
+}
+// if stepNumber >= effectiveCap → sentinel summary
+```
+
+`effectiveCap = Math.min(agent.steps ?? Infinity, MAX_STEPS)` where `MAX_STEPS = 200`.
+
+### S2. `executeToolPhase` becomes non-recursive
+
+Remove the `runAssistantTurn` call at `tool-phase.ts:342`. Instead, return a result indicating what happened: `{action: 'continue' | 'paused' | 'synthesis_done', toolsUsed, recentToolCalls, nextAssistantId}`. The caller (the while loop) uses the action to decide whether to continue or break.
+
+### S3. `agent.steps` field
+
+`agents.ts:ParsedFrontmatter` gains `steps?: number`. Parser extracts it from YAML frontmatter (integer ≥ 0). `steps: 0` means "no tool calls allowed" — loop body never executes; assistant responds text-only.
+
+### S4. Step-boundary events
+
+At the top of each loop iteration, emit a `step_start` part with payload `{step_number, started_at}`. Uses `insertParts` into the current assistant message. No `step_finish` — the next `step_start` (or message completion) implicitly ends the previous step.
+
+### S5. Doom-loop migration
+
+`detectDoomLoop` check moves from `runAssistantTurn` (top of function, pre-stream) to the top of the while-loop body (same logical position). Same predicate, same threshold (3). Same `runDoomLoopSummary` call. Control flow changes from `return` (unwinding recursion) to `break` (exiting loop).
+
+### S6. Step-cap sentinel
+
+When `stepNumber >= effectiveCap`, write a sentinel summary like the existing `runCapHitSummary`. Reuse `runCapHitSummary` with a reason parameter distinguishing "budget exhaustion" from "step cap hit", or create a parallel `runStepCapSummary`. The sentinel makes the cap visible in chat.
+
+### S7. AGENTS.md updates
+
+Add `steps:` to each agent in `data/AGENTS.md`:
+- Refactorer: `steps: 5`
+- Architect: `steps: 20`
+- All others: unset (infinity — bounded only by `MAX_STEPS = 200`)
+
+### S8. Tests
+
+New test file `apps/server/src/services/__tests__/outer-loop.test.ts` covering:
+- Clean finish (stream returns non-tool, loop exits after 1 iteration)
+- Step-cap hit (loop exits at cap, sentinel written)
+- Doom-loop break (3 identical calls, sentinel written)
+- Budget exhaustion (toolsUsed >= budget, cap-hit sentinel written)
+- Abort mid-step (signal fires, loop exits)
+- `steps: 0` edge case (no loop iterations, text-only response)
+- Synthesis success (loop exits after synthesis)
+
+## Non-goals
+
+- No frontend changes. `step_start` parts surface via `messages_with_parts` automatically; UI doesn't render them in v1.14.
+- No `output_schema` / `exit_expression` / `execution_strategy` AGENTS.md fields.
+- No per-step snapshot for revert (v2.0 BooCoder concern).
+- No changes to budget constants (50 / 10 / 50). That's a separate concern.
+- No `repairToolCall` changes.
+- No compaction changes.
+
+## Hard rules
+
+- No git commit, push. Sam commits.
+- Backup before editing.
+- TS strict, no `any`.
+- Doom-loop threshold stays at 3.
+- 332+ existing tests still pass + new outer-loop tests.
+
+## Files expected to touch
+
+- `apps/server/src/services/inference/turn.ts` — recursion → loop
+- `apps/server/src/services/inference/tool-phase.ts` — remove recursive call, return result struct
+- `apps/server/src/services/inference/sentinel-summaries.ts` — step-cap sentinel (or extend cap-hit)
+- `apps/server/src/services/agents.ts` — parse `steps` field
+- `data/AGENTS.md` — add `steps:` to Refactorer + Architect
+- `apps/server/src/services/__tests__/outer-loop.test.ts` — NEW
+- `apps/server/src/services/inference/index.ts` — re-export if new types needed
+
+## Estimate
+
+~300 LoC net (turn.ts refactor + tool-phase return struct + agents parser + tests). The conversion is structural, not behavioral — every exit path is preserved, just expressed as loop control flow instead of recursion unwinding.
--- a/openspec/changes/v1.14-outer-loop/tasks.md
+++ b/openspec/changes/v1.14-outer-loop/tasks.md
@@ -0,0 +1,82 @@
+# v1.14.0-outer-loop tasks
+
+## B1 — Backups
+
+- [ ] `turn.ts`, `tool-phase.ts`, `sentinel-summaries.ts`, `agents.ts`, `data/AGENTS.md`
+
+## B2 — agents.ts: parse `steps` field
+
+- [ ] Add `steps?: number` to `ParsedFrontmatter` interface
+- [ ] Parse from YAML frontmatter: integer ≥ 0, warn on out-of-range (negative or non-integer), clamp to 0
+- [ ] Expose on the `Agent` type returned by `getAgentsForProject`
+- [ ] `npx tsc --noEmit -p apps/server` clean
+
+## B3 — AGENTS.md: add `steps:` to Refactorer + Architect
+
+- [ ] `data/AGENTS.md` — Refactorer: `steps: 5`
+- [ ] `data/AGENTS.md` — Architect: `steps: 20`
+- [ ] All others: leave unset (infinite, bounded by MAX_STEPS=200)
+
+## B4 — tool-phase.ts: remove recursive call, return result struct
+
+- [ ] Define `ToolPhaseResult` interface: `{action: 'continue' | 'paused' | 'synthesis_done', toolCallCount: number, toolCalls: ToolCall[], nextAssistantId: string | null}`
+- [ ] Remove `runAssistantTurn` import and call at line ~342
+- [ ] `executeToolPhase` returns `ToolPhaseResult` instead of `Promise<void>`
+- [ ] On normal path (after creating next assistant row): return `{action: 'continue', toolCallCount, toolCalls: result.toolCalls, nextAssistantId}`
+- [ ] On user-input pause: return `{action: 'paused', toolCallCount: <calls executed so far>, toolCalls: result.toolCalls, nextAssistantId: null}`
+- [ ] On synthesis success: return `{action: 'synthesis_done', toolCallCount, toolCalls: result.toolCalls, nextAssistantId: null}`
+- [ ] `npx tsc --noEmit -p apps/server` will FAIL here (turn.ts still expects void) — expected, fixed in B5
+
+## B5 — turn.ts: recursion → while loop
+
+- [ ] Add `MAX_STEPS = 200` constant
+- [ ] Resolve `effectiveCap = Math.min(agent?.steps ?? Infinity, MAX_STEPS)` at the top of `runAssistantTurn`
+- [ ] Convert `runAssistantTurn` body into a `while (stepNumber < effectiveCap)` loop:
+  - Top of loop: doom-loop check (move from current position; `break` instead of `return`)
+  - Top of loop: budget check (move from current position; `break` instead of `return`, but still call `runCapHitSummary` before break)
+  - Emit `step_start` part via `insertParts` with payload `{step_number: stepNumber, started_at: new Date().toISOString()}`
+  - Call `executeStreamPhase`
+  - If no tool calls → `finalizeCompletion`, `break`
+  - Call `executeToolPhase` (now returns `ToolPhaseResult`)
+  - If `result.action !== 'continue'` → `break`
+  - Update `toolsUsed += result.toolCallCount`
+  - Update `recentToolCalls = [...recentToolCalls, ...result.toolCalls]`
+  - Update `assistantMessageId = result.nextAssistantId!`
+  - Increment `stepNumber`
+- [ ] After loop: if `stepNumber >= effectiveCap` → call step-cap sentinel (B6)
+- [ ] `effectiveCap === 0` edge case: the while condition is immediately false; stream the first turn text-only (the stream phase at the top of the function runs once before the loop — OR handle this by structuring the loop as do-while, OR handle by pre-checking and skipping tools from the request). Pick the cleanest approach.
+- [ ] Remove `TurnArgs` from the module export if it's no longer threaded through recursion — OR keep it and populate from loop locals. (Design note: `TurnArgs` is still used by `executeStreamPhase`, `executeToolPhase`, `sentinel-summaries.ts`, `error-handler.ts`. Keep the interface; populate from loop locals each iteration.)
+- [ ] `npx tsc --noEmit -p apps/server` clean
+- [ ] `pnpm -C apps/server test` — all existing tests pass
+
+## B6 — sentinel-summaries.ts: step-cap sentinel
+
+- [ ] Add `runStepCapSummary` (or extend `runCapHitSummary` with a `reason` param)
+- [ ] Write a sentinel with `metadata.kind = 'cap_hit'` (same as budget) so `CapHitSentinel` UI renders it
+- [ ] Sentinel text distinguishes "Step limit reached (N steps)" from "Tool budget exhausted (N calls)"
+- [ ] Called from the post-loop check in turn.ts (B5)
+
+## B7 — Tests
+
+- [ ] NEW `apps/server/src/services/__tests__/outer-loop.test.ts`
+- [ ] Test: clean finish — stream returns no tool calls, loop exits after 1 step
+- [ ] Test: step-cap hit — mock agent with `steps: 2`, model always returns tool calls, loop exits at 2, sentinel written
+- [ ] Test: doom-loop — 3 identical tool calls, sentinel written, loop breaks
+- [ ] Test: budget exhaustion — toolsUsed >= budget, cap-hit sentinel written
+- [ ] Test: `steps: 0` — no loop iterations, text-only response
+- [ ] Test: synthesis success — loop breaks after synthesis
+- [ ] `pnpm -C apps/server test` — all 332+ existing + new tests pass
+
+## B8 — Verification
+
+- [ ] `npx tsc --noEmit -p apps/server` — 0 errors
+- [ ] `npx tsc -p apps/web/tsconfig.app.json --noEmit` — 0 errors (no web changes; should pass)
+- [ ] `pnpm -C apps/web build` — green
+- [ ] `pnpm -C apps/server test` — all green
+
+## B9 — Docs + tag + deploy
+
+- [ ] `CHANGELOG.md` entry for v1.14.0-outer-loop
+- [ ] `boocode_roadmap.md` retrospective bullet on the v1.14 section
+- [ ] `CLAUDE.md` updates: mention the outer loop, MAX_STEPS, agent.steps in the inference/ section
+- [ ] Commit, tag `v1.14.0-outer-loop`, push, rebuild