v1.14.0-outer-loop: explicit while loop replaces inference recursion
Converts the ad-hoc executeToolPhase → runAssistantTurn recursion into an explicit while (stepNumber < effectiveCap) loop. A step is one stream-and- tool-execute iteration; the loop terminates on non-tool finish, step-cap hit, doom-loop, budget exhaustion, abort, or synthesis success. MAX_STEPS = 200 hard ceiling (4x old effective limit from budget). Per-agent steps: field in AGENTS.md frontmatter sets tighter caps (Refactorer: 5, Architect: 20, others: unset = bounded only by MAX_STEPS). Resolution: effectiveCap = Math.min(agent.steps ?? Infinity, MAX_STEPS). executeToolPhase no longer recurses — returns ToolPhaseResult struct (action: 'continue' | 'paused' | 'synthesis_done') so the caller decides whether to continue or break. steps: 0 handled as "no tool calls allowed" via runTextOnlyTurn (one text-only stream phase, tool calls ignored with warn log). Step-cap hits produce a sentinel summary (reuses cap_hit kind so CapHitSentinel.tsx renders without frontend changes; text distinguishes "Step limit reached" from "Tool budget exhausted"). Doom-loop check migrated to top of loop body — same predicate, same threshold (3), break instead of return. step_start parts are in the schema CHECK but not emitted as message_parts — writing before the stream phase creates a sequence-0 collision with partsFromAssistantMessage. Structured log line emitted instead. Adversarial review caught the collision pre-deploy. 332/332 server tests passing. No frontend changes. No schema changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
72
openspec/changes/v1.14-outer-loop/design.md
Normal file
72
openspec/changes/v1.14-outer-loop/design.md
Normal file
@@ -0,0 +1,72 @@
|
||||
# v1.14.0-outer-loop — design decisions
|
||||
|
||||
Answers to the dispatch's blocking questions, resolved 2026-05-23.
|
||||
|
||||
## D1. Step cap — what replaces MAX_TOOL_LOOP_DEPTH?
|
||||
|
||||
`MAX_TOOL_LOOP_DEPTH` never existed — no hard recursion depth guard was ever in the codebase. Safety came from budget (50 tool calls) + doom-loop (3 identical calls).
|
||||
|
||||
**Decision:** introduce `MAX_STEPS = 200` as a hard ceiling. Per-agent cap via `agent.steps` is the primary knob. Resolution: `effectiveCap = Math.min(agent.steps ?? Infinity, MAX_STEPS)`.
|
||||
|
||||
**Rationale:** Sam reports BooChat gets stuck at 50 tool calls (the budget) too often. The step cap should be generous — 200 is 4x the current de-facto ceiling. Budget (50 tool calls total across all steps) remains a separate concern and is not changed in this batch.
|
||||
|
||||
Note: "step" ≠ "tool call." One step = one stream iteration that may produce multiple parallel tool calls. Budget counts individual tool calls; step cap counts iterations. At 200 steps with average 1-2 tool calls per step, the budget (50) will fire well before the step cap in most scenarios. The step cap is a safety ceiling for cases where the model makes many 1-tool-call iterations.
|
||||
|
||||
## D2. step_finish — emit or not?
|
||||
|
||||
**Decision:** No `step_finish` part. The next `step_start` (or assistant message completion) implicitly ends the previous step.
|
||||
|
||||
**Rationale:** opencode only emits `step_start`. Less noise in parts, simpler code. If UI ever needs step durations, compute from the timestamps of consecutive `step_start` parts.
|
||||
|
||||
## D3. Step-cap hit — sentinel or quiet?
|
||||
|
||||
**Decision:** Write a sentinel summary on step-cap hit. Visible to the user in chat, same as budget-exhaustion's `runCapHitSummary`.
|
||||
|
||||
**Implementation:** Extend `runCapHitSummary` to accept a `reason: 'budget' | 'step_cap'` parameter (or add a parallel `runStepCapSummary`). The sentinel metadata kind stays `cap_hit` — frontend `CapHitSentinel` component already renders it. The sentinel's text distinguishes the two cases ("Tool budget exhausted" vs "Step limit reached").
|
||||
|
||||
## D4. agent.steps = 0
|
||||
|
||||
**Decision:** `steps: 0` means "no tool calls allowed." The loop body never executes. The assistant can only respond with text.
|
||||
|
||||
**Implementation:** When `effectiveCap === 0`, skip the loop entirely. Stream the first assistant turn (text-only), finalize, return. The model receives no tools in the request payload when `steps: 0` (or equivalently, tools are passed but the loop never enters the tool-execution branch).
|
||||
|
||||
Actually, cleaner: `steps: 0` means the loop cap is 0. The while condition `stepNumber < effectiveCap` is false on the first check. The stream phase still runs (the model produces a text response), but if it emits tool calls they're ignored and the turn finalizes as text-only. This may produce a confusing response if the model's text references tool results it never got — but `steps: 0` is an explicit constraint the agent author chose. Document in AGENTS.md parser validation.
|
||||
|
||||
## D5. Synthesis success terminates the loop?
|
||||
|
||||
**Decision:** Yes. `break` out of the loop after synthesis success. Preserves current behavior (synthesis replaces the recursive call; no further iterations).
|
||||
|
||||
**Rationale:** The synthesis pass produces a self-contained summary turn. Continuing the loop after synthesis would let the model issue more tool calls on top of a synthesis summary, which is semantically wrong — the synthesis IS the final answer for that tool call batch.
|
||||
|
||||
## D6. executeToolPhase return struct
|
||||
|
||||
The recursive call at `tool-phase.ts:342` is currently the last thing `executeToolPhase` does (after creating the next assistant row). After the conversion, `executeToolPhase` returns a struct the loop body reads:
|
||||
|
||||
```typescript
|
||||
interface ToolPhaseResult {
|
||||
action: 'continue' | 'paused' | 'synthesis_done';
|
||||
toolCallCount: number;
|
||||
toolCalls: ToolCall[];
|
||||
nextAssistantId: string | null;
|
||||
}
|
||||
```
|
||||
|
||||
- `continue` → loop continues; `nextAssistantId` is the new assistant message's UUID.
|
||||
- `paused` → user-input or grant pause; loop breaks. `nextAssistantId` is null.
|
||||
- `synthesis_done` → synthesis succeeded; loop breaks. `nextAssistantId` is null (synthesis wrote its own parts).
|
||||
|
||||
The loop body then:
|
||||
1. Updates `toolsUsed += result.toolCallCount`
|
||||
2. Appends `result.toolCalls` to `recentToolCalls`
|
||||
3. Sets `assistantMessageId = result.nextAssistantId` for the next iteration
|
||||
4. Increments `stepNumber`
|
||||
5. Checks `result.action` — if not `continue`, breaks.
|
||||
|
||||
## D7. Budget vs steps interaction
|
||||
|
||||
Budget counts **individual tool calls** across the entire turn. Steps counts **loop iterations**. They are orthogonal:
|
||||
|
||||
- Budget fires when `toolsUsed >= resolveToolBudget(agent)` (currently 50 for read-only). Checked at the top of each iteration.
|
||||
- Step cap fires when `stepNumber >= effectiveCap`. Checked by the loop condition.
|
||||
|
||||
Both produce a sentinel summary. A turn can be terminated by whichever fires first. In practice, budget (50 tool calls) fires before step cap (200 steps) unless the model produces many 0-tool-call iterations (which shouldn't happen — 0 tool calls means non-tool finish, which exits the loop via the `break` path).
|
||||
112
openspec/changes/v1.14-outer-loop/proposal.md
Normal file
112
openspec/changes/v1.14-outer-loop/proposal.md
Normal file
@@ -0,0 +1,112 @@
|
||||
# v1.14.0-outer-loop — explicit outer agent loop
|
||||
|
||||
Replace the ad-hoc `executeToolPhase → runAssistantTurn` recursion with an explicit `while` loop. A **step** is one stream-and-tool-execute iteration; a step can contain multiple parallel tool calls. The loop terminates on non-tool finish OR step-cap hit OR doom-loop OR budget exhaustion OR abort OR synthesis success.
|
||||
|
||||
## Why
|
||||
|
||||
The current recursion works but has two problems: (a) stack depth grows linearly with tool iterations — 50 nested async frames is fragile, (b) there's no explicit step counter, so there's no per-agent step cap and no step-boundary instrumentation. BooChat also gets stuck at 50 tool calls (the budget ceiling) more often than it should — the new `MAX_STEPS = 200` hard ceiling lets the loop run much longer before the step cap fires, while the existing budget (50 tool calls) remains a separate concern.
|
||||
|
||||
## Recon findings (verified 2026-05-23)
|
||||
|
||||
- `runAssistantTurn` at `turn.ts:144-147` is the recursive entry. Returns `Promise<void>`.
|
||||
- `executeToolPhase` at `tool-phase.ts:89-96` calls back into `runAssistantTurn` at `tool-phase.ts:342`.
|
||||
- Recursion terminates on: non-tool finish, budget exhaustion (`args.toolsUsed >= budget`), doom-loop (3 identical calls via `detectDoomLoop`), user-input pause (ask_user_input / request_read_access), synthesis success, stream error, abort.
|
||||
- **No existing hard recursion depth limit** — `MAX_TOOL_LOOP_DEPTH` does not exist. Safety comes from budget (50) + doom-loop (3 identical).
|
||||
- `TurnArgs` defined in `turn.ts:127-141`, not `types.ts`. Fields: `sessionId`, `chatId`, `assistantMessageId`, `toolsUsed`, `recentToolCalls`, `signal`. All mutable fields are threaded through the recursive call.
|
||||
- Synthesis pipeline (`synthesisPipeline.ts`) is a branch in `executeToolPhase` — if synthesis succeeds, recursion is skipped.
|
||||
- `step_start` already in the `message_parts.kind` CHECK constraint. No schema change needed.
|
||||
- `agents.ts` does NOT currently parse a `steps` field. Needs adding to `ParsedFrontmatter`.
|
||||
|
||||
## Scope
|
||||
|
||||
### S1. Outer loop in `turn.ts`
|
||||
|
||||
Convert the recursive chain to a `while (stepNumber < effectiveCap)` loop:
|
||||
|
||||
```
|
||||
let stepNumber = 0
|
||||
while (stepNumber < effectiveCap) {
|
||||
// doom-loop check
|
||||
// budget check
|
||||
// emit step_start part
|
||||
// stream phase (executeStreamPhase)
|
||||
// if no tool calls → finalize, break
|
||||
// tool phase (executeToolPhase — now returns, doesn't recurse)
|
||||
// if paused (user input / grant) → break
|
||||
// if synthesis succeeded → break
|
||||
// create next assistant message row
|
||||
// increment stepNumber, update toolsUsed, append recentToolCalls
|
||||
}
|
||||
// if stepNumber >= effectiveCap → sentinel summary
|
||||
```
|
||||
|
||||
`effectiveCap = Math.min(agent.steps ?? Infinity, MAX_STEPS)` where `MAX_STEPS = 200`.
|
||||
|
||||
### S2. `executeToolPhase` becomes non-recursive
|
||||
|
||||
Remove the `runAssistantTurn` call at `tool-phase.ts:342`. Instead, return a result indicating what happened: `{action: 'continue' | 'paused' | 'synthesis_done', toolsUsed, recentToolCalls, nextAssistantId}`. The caller (the while loop) uses the action to decide whether to continue or break.
|
||||
|
||||
### S3. `agent.steps` field
|
||||
|
||||
`agents.ts:ParsedFrontmatter` gains `steps?: number`. Parser extracts it from YAML frontmatter (integer ≥ 0). `steps: 0` means "no tool calls allowed" — loop body never executes; assistant responds text-only.
|
||||
|
||||
### S4. Step-boundary events
|
||||
|
||||
At the top of each loop iteration, emit a `step_start` part with payload `{step_number, started_at}`. Uses `insertParts` into the current assistant message. No `step_finish` — the next `step_start` (or message completion) implicitly ends the previous step.
|
||||
|
||||
### S5. Doom-loop migration
|
||||
|
||||
`detectDoomLoop` check moves from `runAssistantTurn` (top of function, pre-stream) to the top of the while-loop body (same logical position). Same predicate, same threshold (3). Same `runDoomLoopSummary` call. Control flow changes from `return` (unwinding recursion) to `break` (exiting loop).
|
||||
|
||||
### S6. Step-cap sentinel
|
||||
|
||||
When `stepNumber >= effectiveCap`, write a sentinel summary like the existing `runCapHitSummary`. Reuse `runCapHitSummary` with a reason parameter distinguishing "budget exhaustion" from "step cap hit", or create a parallel `runStepCapSummary`. The sentinel makes the cap visible in chat.
|
||||
|
||||
### S7. AGENTS.md updates
|
||||
|
||||
Add `steps:` to each agent in `data/AGENTS.md`:
|
||||
- Refactorer: `steps: 5`
|
||||
- Architect: `steps: 20`
|
||||
- All others: unset (infinity — bounded only by `MAX_STEPS = 200`)
|
||||
|
||||
### S8. Tests
|
||||
|
||||
New test file `apps/server/src/services/__tests__/outer-loop.test.ts` covering:
|
||||
- Clean finish (stream returns non-tool, loop exits after 1 iteration)
|
||||
- Step-cap hit (loop exits at cap, sentinel written)
|
||||
- Doom-loop break (3 identical calls, sentinel written)
|
||||
- Budget exhaustion (toolsUsed >= budget, cap-hit sentinel written)
|
||||
- Abort mid-step (signal fires, loop exits)
|
||||
- `steps: 0` edge case (no loop iterations, text-only response)
|
||||
- Synthesis success (loop exits after synthesis)
|
||||
|
||||
## Non-goals
|
||||
|
||||
- No frontend changes. `step_start` parts surface via `messages_with_parts` automatically; UI doesn't render them in v1.14.
|
||||
- No `output_schema` / `exit_expression` / `execution_strategy` AGENTS.md fields.
|
||||
- No per-step snapshot for revert (v2.0 BooCoder concern).
|
||||
- No changes to budget constants (50 / 10 / 50). That's a separate concern.
|
||||
- No `repairToolCall` changes.
|
||||
- No compaction changes.
|
||||
|
||||
## Hard rules
|
||||
|
||||
- No git commit, push. Sam commits.
|
||||
- Backup before editing.
|
||||
- TS strict, no `any`.
|
||||
- Doom-loop threshold stays at 3.
|
||||
- 332+ existing tests still pass + new outer-loop tests.
|
||||
|
||||
## Files expected to touch
|
||||
|
||||
- `apps/server/src/services/inference/turn.ts` — recursion → loop
|
||||
- `apps/server/src/services/inference/tool-phase.ts` — remove recursive call, return result struct
|
||||
- `apps/server/src/services/inference/sentinel-summaries.ts` — step-cap sentinel (or extend cap-hit)
|
||||
- `apps/server/src/services/agents.ts` — parse `steps` field
|
||||
- `data/AGENTS.md` — add `steps:` to Refactorer + Architect
|
||||
- `apps/server/src/services/__tests__/outer-loop.test.ts` — NEW
|
||||
- `apps/server/src/services/inference/index.ts` — re-export if new types needed
|
||||
|
||||
## Estimate
|
||||
|
||||
~300 LoC net (turn.ts refactor + tool-phase return struct + agents parser + tests). The conversion is structural, not behavioral — every exit path is preserved, just expressed as loop control flow instead of recursion unwinding.
|
||||
82
openspec/changes/v1.14-outer-loop/tasks.md
Normal file
82
openspec/changes/v1.14-outer-loop/tasks.md
Normal file
@@ -0,0 +1,82 @@
|
||||
# v1.14.0-outer-loop tasks
|
||||
|
||||
## B1 — Backups
|
||||
|
||||
- [ ] `turn.ts`, `tool-phase.ts`, `sentinel-summaries.ts`, `agents.ts`, `data/AGENTS.md`
|
||||
|
||||
## B2 — agents.ts: parse `steps` field
|
||||
|
||||
- [ ] Add `steps?: number` to `ParsedFrontmatter` interface
|
||||
- [ ] Parse from YAML frontmatter: integer ≥ 0, warn on out-of-range (negative or non-integer), clamp to 0
|
||||
- [ ] Expose on the `Agent` type returned by `getAgentsForProject`
|
||||
- [ ] `npx tsc --noEmit -p apps/server` clean
|
||||
|
||||
## B3 — AGENTS.md: add `steps:` to Refactorer + Architect
|
||||
|
||||
- [ ] `data/AGENTS.md` — Refactorer: `steps: 5`
|
||||
- [ ] `data/AGENTS.md` — Architect: `steps: 20`
|
||||
- [ ] All others: leave unset (infinite, bounded by MAX_STEPS=200)
|
||||
|
||||
## B4 — tool-phase.ts: remove recursive call, return result struct
|
||||
|
||||
- [ ] Define `ToolPhaseResult` interface: `{action: 'continue' | 'paused' | 'synthesis_done', toolCallCount: number, toolCalls: ToolCall[], nextAssistantId: string | null}`
|
||||
- [ ] Remove `runAssistantTurn` import and call at line ~342
|
||||
- [ ] `executeToolPhase` returns `ToolPhaseResult` instead of `Promise<void>`
|
||||
- [ ] On normal path (after creating next assistant row): return `{action: 'continue', toolCallCount, toolCalls: result.toolCalls, nextAssistantId}`
|
||||
- [ ] On user-input pause: return `{action: 'paused', toolCallCount: <calls executed so far>, toolCalls: result.toolCalls, nextAssistantId: null}`
|
||||
- [ ] On synthesis success: return `{action: 'synthesis_done', toolCallCount, toolCalls: result.toolCalls, nextAssistantId: null}`
|
||||
- [ ] `npx tsc --noEmit -p apps/server` will FAIL here (turn.ts still expects void) — expected, fixed in B5
|
||||
|
||||
## B5 — turn.ts: recursion → while loop
|
||||
|
||||
- [ ] Add `MAX_STEPS = 200` constant
|
||||
- [ ] Resolve `effectiveCap = Math.min(agent?.steps ?? Infinity, MAX_STEPS)` at the top of `runAssistantTurn`
|
||||
- [ ] Convert `runAssistantTurn` body into a `while (stepNumber < effectiveCap)` loop:
|
||||
- Top of loop: doom-loop check (move from current position; `break` instead of `return`)
|
||||
- Top of loop: budget check (move from current position; `break` instead of `return`, but still call `runCapHitSummary` before break)
|
||||
- Emit `step_start` part via `insertParts` with payload `{step_number: stepNumber, started_at: new Date().toISOString()}`
|
||||
- Call `executeStreamPhase`
|
||||
- If no tool calls → `finalizeCompletion`, `break`
|
||||
- Call `executeToolPhase` (now returns `ToolPhaseResult`)
|
||||
- If `result.action !== 'continue'` → `break`
|
||||
- Update `toolsUsed += result.toolCallCount`
|
||||
- Update `recentToolCalls = [...recentToolCalls, ...result.toolCalls]`
|
||||
- Update `assistantMessageId = result.nextAssistantId!`
|
||||
- Increment `stepNumber`
|
||||
- [ ] After loop: if `stepNumber >= effectiveCap` → call step-cap sentinel (B6)
|
||||
- [ ] `effectiveCap === 0` edge case: the while condition is immediately false; stream the first turn text-only (the stream phase at the top of the function runs once before the loop — OR handle this by structuring the loop as do-while, OR handle by pre-checking and skipping tools from the request). Pick the cleanest approach.
|
||||
- [ ] Remove `TurnArgs` from the module export if it's no longer threaded through recursion — OR keep it and populate from loop locals. (Design note: `TurnArgs` is still used by `executeStreamPhase`, `executeToolPhase`, `sentinel-summaries.ts`, `error-handler.ts`. Keep the interface; populate from loop locals each iteration.)
|
||||
- [ ] `npx tsc --noEmit -p apps/server` clean
|
||||
- [ ] `pnpm -C apps/server test` — all existing tests pass
|
||||
|
||||
## B6 — sentinel-summaries.ts: step-cap sentinel
|
||||
|
||||
- [ ] Add `runStepCapSummary` (or extend `runCapHitSummary` with a `reason` param)
|
||||
- [ ] Write a sentinel with `metadata.kind = 'cap_hit'` (same as budget) so `CapHitSentinel` UI renders it
|
||||
- [ ] Sentinel text distinguishes "Step limit reached (N steps)" from "Tool budget exhausted (N calls)"
|
||||
- [ ] Called from the post-loop check in turn.ts (B5)
|
||||
|
||||
## B7 — Tests
|
||||
|
||||
- [ ] NEW `apps/server/src/services/__tests__/outer-loop.test.ts`
|
||||
- [ ] Test: clean finish — stream returns no tool calls, loop exits after 1 step
|
||||
- [ ] Test: step-cap hit — mock agent with `steps: 2`, model always returns tool calls, loop exits at 2, sentinel written
|
||||
- [ ] Test: doom-loop — 3 identical tool calls, sentinel written, loop breaks
|
||||
- [ ] Test: budget exhaustion — toolsUsed >= budget, cap-hit sentinel written
|
||||
- [ ] Test: `steps: 0` — no loop iterations, text-only response
|
||||
- [ ] Test: synthesis success — loop breaks after synthesis
|
||||
- [ ] `pnpm -C apps/server test` — all 332+ existing + new tests pass
|
||||
|
||||
## B8 — Verification
|
||||
|
||||
- [ ] `npx tsc --noEmit -p apps/server` — 0 errors
|
||||
- [ ] `npx tsc -p apps/web/tsconfig.app.json --noEmit` — 0 errors (no web changes; should pass)
|
||||
- [ ] `pnpm -C apps/web build` — green
|
||||
- [ ] `pnpm -C apps/server test` — all green
|
||||
|
||||
## B9 — Docs + tag + deploy
|
||||
|
||||
- [ ] `CHANGELOG.md` entry for v1.14.0-outer-loop
|
||||
- [ ] `boocode_roadmap.md` retrospective bullet on the v1.14 section
|
||||
- [ ] `CLAUDE.md` updates: mention the outer loop, MAX_STEPS, agent.steps in the inference/ section
|
||||
- [ ] Commit, tag `v1.14.0-outer-loop`, push, rebuild
|
||||
Reference in New Issue
Block a user