Converts the ad-hoc executeToolPhase → runAssistantTurn recursion into an explicit while (stepNumber < effectiveCap) loop. A step is one stream-and- tool-execute iteration; the loop terminates on non-tool finish, step-cap hit, doom-loop, budget exhaustion, abort, or synthesis success. MAX_STEPS = 200 hard ceiling (4x old effective limit from budget). Per-agent steps: field in AGENTS.md frontmatter sets tighter caps (Refactorer: 5, Architect: 20, others: unset = bounded only by MAX_STEPS). Resolution: effectiveCap = Math.min(agent.steps ?? Infinity, MAX_STEPS). executeToolPhase no longer recurses — returns ToolPhaseResult struct (action: 'continue' | 'paused' | 'synthesis_done') so the caller decides whether to continue or break. steps: 0 handled as "no tool calls allowed" via runTextOnlyTurn (one text-only stream phase, tool calls ignored with warn log). Step-cap hits produce a sentinel summary (reuses cap_hit kind so CapHitSentinel.tsx renders without frontend changes; text distinguishes "Step limit reached" from "Tool budget exhausted"). Doom-loop check migrated to top of loop body — same predicate, same threshold (3), break instead of return. step_start parts are in the schema CHECK but not emitted as message_parts — writing before the stream phase creates a sequence-0 collision with partsFromAssistantMessage. Structured log line emitted instead. Adversarial review caught the collision pre-deploy. 332/332 server tests passing. No frontend changes. No schema changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
113 lines
6.3 KiB
Markdown
113 lines
6.3 KiB
Markdown
# v1.14.0-outer-loop — explicit outer agent loop
|
|
|
|
Replace the ad-hoc `executeToolPhase → runAssistantTurn` recursion with an explicit `while` loop. A **step** is one stream-and-tool-execute iteration; a step can contain multiple parallel tool calls. The loop terminates on non-tool finish OR step-cap hit OR doom-loop OR budget exhaustion OR abort OR synthesis success.
|
|
|
|
## Why
|
|
|
|
The current recursion works but has two problems: (a) stack depth grows linearly with tool iterations — 50 nested async frames is fragile, (b) there's no explicit step counter, so there's no per-agent step cap and no step-boundary instrumentation. BooChat also gets stuck at 50 tool calls (the budget ceiling) more often than it should — the new `MAX_STEPS = 200` hard ceiling lets the loop run much longer before the step cap fires, while the existing budget (50 tool calls) remains a separate concern.
|
|
|
|
## Recon findings (verified 2026-05-23)
|
|
|
|
- `runAssistantTurn` at `turn.ts:144-147` is the recursive entry. Returns `Promise<void>`.
|
|
- `executeToolPhase` at `tool-phase.ts:89-96` calls back into `runAssistantTurn` at `tool-phase.ts:342`.
|
|
- Recursion terminates on: non-tool finish, budget exhaustion (`args.toolsUsed >= budget`), doom-loop (3 identical calls via `detectDoomLoop`), user-input pause (ask_user_input / request_read_access), synthesis success, stream error, abort.
|
|
- **No existing hard recursion depth limit** — `MAX_TOOL_LOOP_DEPTH` does not exist. Safety comes from budget (50) + doom-loop (3 identical).
|
|
- `TurnArgs` defined in `turn.ts:127-141`, not `types.ts`. Fields: `sessionId`, `chatId`, `assistantMessageId`, `toolsUsed`, `recentToolCalls`, `signal`. All mutable fields are threaded through the recursive call.
|
|
- Synthesis pipeline (`synthesisPipeline.ts`) is a branch in `executeToolPhase` — if synthesis succeeds, recursion is skipped.
|
|
- `step_start` already in the `message_parts.kind` CHECK constraint. No schema change needed.
|
|
- `agents.ts` does NOT currently parse a `steps` field. Needs adding to `ParsedFrontmatter`.
|
|
|
|
## Scope
|
|
|
|
### S1. Outer loop in `turn.ts`
|
|
|
|
Convert the recursive chain to a `while (stepNumber < effectiveCap)` loop:
|
|
|
|
```
|
|
let stepNumber = 0
|
|
while (stepNumber < effectiveCap) {
|
|
// doom-loop check
|
|
// budget check
|
|
// emit step_start part
|
|
// stream phase (executeStreamPhase)
|
|
// if no tool calls → finalize, break
|
|
// tool phase (executeToolPhase — now returns, doesn't recurse)
|
|
// if paused (user input / grant) → break
|
|
// if synthesis succeeded → break
|
|
// create next assistant message row
|
|
// increment stepNumber, update toolsUsed, append recentToolCalls
|
|
}
|
|
// if stepNumber >= effectiveCap → sentinel summary
|
|
```
|
|
|
|
`effectiveCap = Math.min(agent.steps ?? Infinity, MAX_STEPS)` where `MAX_STEPS = 200`.
|
|
|
|
### S2. `executeToolPhase` becomes non-recursive
|
|
|
|
Remove the `runAssistantTurn` call at `tool-phase.ts:342`. Instead, return a result indicating what happened: `{action: 'continue' | 'paused' | 'synthesis_done', toolsUsed, recentToolCalls, nextAssistantId}`. The caller (the while loop) uses the action to decide whether to continue or break.
|
|
|
|
### S3. `agent.steps` field
|
|
|
|
`agents.ts:ParsedFrontmatter` gains `steps?: number`. Parser extracts it from YAML frontmatter (integer ≥ 0). `steps: 0` means "no tool calls allowed" — loop body never executes; assistant responds text-only.
|
|
|
|
### S4. Step-boundary events
|
|
|
|
At the top of each loop iteration, emit a `step_start` part with payload `{step_number, started_at}`. Uses `insertParts` into the current assistant message. No `step_finish` — the next `step_start` (or message completion) implicitly ends the previous step.
|
|
|
|
### S5. Doom-loop migration
|
|
|
|
`detectDoomLoop` check moves from `runAssistantTurn` (top of function, pre-stream) to the top of the while-loop body (same logical position). Same predicate, same threshold (3). Same `runDoomLoopSummary` call. Control flow changes from `return` (unwinding recursion) to `break` (exiting loop).
|
|
|
|
### S6. Step-cap sentinel
|
|
|
|
When `stepNumber >= effectiveCap`, write a sentinel summary like the existing `runCapHitSummary`. Reuse `runCapHitSummary` with a reason parameter distinguishing "budget exhaustion" from "step cap hit", or create a parallel `runStepCapSummary`. The sentinel makes the cap visible in chat.
|
|
|
|
### S7. AGENTS.md updates
|
|
|
|
Add `steps:` to each agent in `data/AGENTS.md`:
|
|
- Refactorer: `steps: 5`
|
|
- Architect: `steps: 20`
|
|
- All others: unset (infinity — bounded only by `MAX_STEPS = 200`)
|
|
|
|
### S8. Tests
|
|
|
|
New test file `apps/server/src/services/__tests__/outer-loop.test.ts` covering:
|
|
- Clean finish (stream returns non-tool, loop exits after 1 iteration)
|
|
- Step-cap hit (loop exits at cap, sentinel written)
|
|
- Doom-loop break (3 identical calls, sentinel written)
|
|
- Budget exhaustion (toolsUsed >= budget, cap-hit sentinel written)
|
|
- Abort mid-step (signal fires, loop exits)
|
|
- `steps: 0` edge case (no loop iterations, text-only response)
|
|
- Synthesis success (loop exits after synthesis)
|
|
|
|
## Non-goals
|
|
|
|
- No frontend changes. `step_start` parts surface via `messages_with_parts` automatically; UI doesn't render them in v1.14.
|
|
- No `output_schema` / `exit_expression` / `execution_strategy` AGENTS.md fields.
|
|
- No per-step snapshot for revert (v2.0 BooCoder concern).
|
|
- No changes to budget constants (50 / 10 / 50). That's a separate concern.
|
|
- No `repairToolCall` changes.
|
|
- No compaction changes.
|
|
|
|
## Hard rules
|
|
|
|
- No git commit, push. Sam commits.
|
|
- Backup before editing.
|
|
- TS strict, no `any`.
|
|
- Doom-loop threshold stays at 3.
|
|
- 332+ existing tests still pass + new outer-loop tests.
|
|
|
|
## Files expected to touch
|
|
|
|
- `apps/server/src/services/inference/turn.ts` — recursion → loop
|
|
- `apps/server/src/services/inference/tool-phase.ts` — remove recursive call, return result struct
|
|
- `apps/server/src/services/inference/sentinel-summaries.ts` — step-cap sentinel (or extend cap-hit)
|
|
- `apps/server/src/services/agents.ts` — parse `steps` field
|
|
- `data/AGENTS.md` — add `steps:` to Refactorer + Architect
|
|
- `apps/server/src/services/__tests__/outer-loop.test.ts` — NEW
|
|
- `apps/server/src/services/inference/index.ts` — re-export if new types needed
|
|
|
|
## Estimate
|
|
|
|
~300 LoC net (turn.ts refactor + tool-phase return struct + agents parser + tests). The conversion is structural, not behavioral — every exit path is preserved, just expressed as loop control flow instead of recursion unwinding.
|