Converts the ad-hoc executeToolPhase → runAssistantTurn recursion into an explicit while (stepNumber < effectiveCap) loop. A step is one stream-and- tool-execute iteration; the loop terminates on non-tool finish, step-cap hit, doom-loop, budget exhaustion, abort, or synthesis success. MAX_STEPS = 200 hard ceiling (4x old effective limit from budget). Per-agent steps: field in AGENTS.md frontmatter sets tighter caps (Refactorer: 5, Architect: 20, others: unset = bounded only by MAX_STEPS). Resolution: effectiveCap = Math.min(agent.steps ?? Infinity, MAX_STEPS). executeToolPhase no longer recurses — returns ToolPhaseResult struct (action: 'continue' | 'paused' | 'synthesis_done') so the caller decides whether to continue or break. steps: 0 handled as "no tool calls allowed" via runTextOnlyTurn (one text-only stream phase, tool calls ignored with warn log). Step-cap hits produce a sentinel summary (reuses cap_hit kind so CapHitSentinel.tsx renders without frontend changes; text distinguishes "Step limit reached" from "Tool budget exhausted"). Doom-loop check migrated to top of loop body — same predicate, same threshold (3), break instead of return. step_start parts are in the schema CHECK but not emitted as message_parts — writing before the stream phase creates a sequence-0 collision with partsFromAssistantMessage. Structured log line emitted instead. Adversarial review caught the collision pre-deploy. 332/332 server tests passing. No frontend changes. No schema changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
83 lines
5.0 KiB
Markdown
83 lines
5.0 KiB
Markdown
# v1.14.0-outer-loop tasks
|
|
|
|
## B1 — Backups
|
|
|
|
- [ ] `turn.ts`, `tool-phase.ts`, `sentinel-summaries.ts`, `agents.ts`, `data/AGENTS.md`
|
|
|
|
## B2 — agents.ts: parse `steps` field
|
|
|
|
- [ ] Add `steps?: number` to `ParsedFrontmatter` interface
|
|
- [ ] Parse from YAML frontmatter: integer ≥ 0, warn on out-of-range (negative or non-integer), clamp to 0
|
|
- [ ] Expose on the `Agent` type returned by `getAgentsForProject`
|
|
- [ ] `npx tsc --noEmit -p apps/server` clean
|
|
|
|
## B3 — AGENTS.md: add `steps:` to Refactorer + Architect
|
|
|
|
- [ ] `data/AGENTS.md` — Refactorer: `steps: 5`
|
|
- [ ] `data/AGENTS.md` — Architect: `steps: 20`
|
|
- [ ] All others: leave unset (infinite, bounded by MAX_STEPS=200)
|
|
|
|
## B4 — tool-phase.ts: remove recursive call, return result struct
|
|
|
|
- [ ] Define `ToolPhaseResult` interface: `{action: 'continue' | 'paused' | 'synthesis_done', toolCallCount: number, toolCalls: ToolCall[], nextAssistantId: string | null}`
|
|
- [ ] Remove `runAssistantTurn` import and call at line ~342
|
|
- [ ] `executeToolPhase` returns `ToolPhaseResult` instead of `Promise<void>`
|
|
- [ ] On normal path (after creating next assistant row): return `{action: 'continue', toolCallCount, toolCalls: result.toolCalls, nextAssistantId}`
|
|
- [ ] On user-input pause: return `{action: 'paused', toolCallCount: <calls executed so far>, toolCalls: result.toolCalls, nextAssistantId: null}`
|
|
- [ ] On synthesis success: return `{action: 'synthesis_done', toolCallCount, toolCalls: result.toolCalls, nextAssistantId: null}`
|
|
- [ ] `npx tsc --noEmit -p apps/server` will FAIL here (turn.ts still expects void) — expected, fixed in B5
|
|
|
|
## B5 — turn.ts: recursion → while loop
|
|
|
|
- [ ] Add `MAX_STEPS = 200` constant
|
|
- [ ] Resolve `effectiveCap = Math.min(agent?.steps ?? Infinity, MAX_STEPS)` at the top of `runAssistantTurn`
|
|
- [ ] Convert `runAssistantTurn` body into a `while (stepNumber < effectiveCap)` loop:
|
|
- Top of loop: doom-loop check (move from current position; `break` instead of `return`)
|
|
- Top of loop: budget check (move from current position; `break` instead of `return`, but still call `runCapHitSummary` before break)
|
|
- Emit `step_start` part via `insertParts` with payload `{step_number: stepNumber, started_at: new Date().toISOString()}`
|
|
- Call `executeStreamPhase`
|
|
- If no tool calls → `finalizeCompletion`, `break`
|
|
- Call `executeToolPhase` (now returns `ToolPhaseResult`)
|
|
- If `result.action !== 'continue'` → `break`
|
|
- Update `toolsUsed += result.toolCallCount`
|
|
- Update `recentToolCalls = [...recentToolCalls, ...result.toolCalls]`
|
|
- Update `assistantMessageId = result.nextAssistantId!`
|
|
- Increment `stepNumber`
|
|
- [ ] After loop: if `stepNumber >= effectiveCap` → call step-cap sentinel (B6)
|
|
- [ ] `effectiveCap === 0` edge case: the while condition is immediately false; stream the first turn text-only (the stream phase at the top of the function runs once before the loop — OR handle this by structuring the loop as do-while, OR handle by pre-checking and skipping tools from the request). Pick the cleanest approach.
|
|
- [ ] Remove `TurnArgs` from the module export if it's no longer threaded through recursion — OR keep it and populate from loop locals. (Design note: `TurnArgs` is still used by `executeStreamPhase`, `executeToolPhase`, `sentinel-summaries.ts`, `error-handler.ts`. Keep the interface; populate from loop locals each iteration.)
|
|
- [ ] `npx tsc --noEmit -p apps/server` clean
|
|
- [ ] `pnpm -C apps/server test` — all existing tests pass
|
|
|
|
## B6 — sentinel-summaries.ts: step-cap sentinel
|
|
|
|
- [ ] Add `runStepCapSummary` (or extend `runCapHitSummary` with a `reason` param)
|
|
- [ ] Write a sentinel with `metadata.kind = 'cap_hit'` (same as budget) so `CapHitSentinel` UI renders it
|
|
- [ ] Sentinel text distinguishes "Step limit reached (N steps)" from "Tool budget exhausted (N calls)"
|
|
- [ ] Called from the post-loop check in turn.ts (B5)
|
|
|
|
## B7 — Tests
|
|
|
|
- [ ] NEW `apps/server/src/services/__tests__/outer-loop.test.ts`
|
|
- [ ] Test: clean finish — stream returns no tool calls, loop exits after 1 step
|
|
- [ ] Test: step-cap hit — mock agent with `steps: 2`, model always returns tool calls, loop exits at 2, sentinel written
|
|
- [ ] Test: doom-loop — 3 identical tool calls, sentinel written, loop breaks
|
|
- [ ] Test: budget exhaustion — toolsUsed >= budget, cap-hit sentinel written
|
|
- [ ] Test: `steps: 0` — no loop iterations, text-only response
|
|
- [ ] Test: synthesis success — loop breaks after synthesis
|
|
- [ ] `pnpm -C apps/server test` — all 332+ existing + new tests pass
|
|
|
|
## B8 — Verification
|
|
|
|
- [ ] `npx tsc --noEmit -p apps/server` — 0 errors
|
|
- [ ] `npx tsc -p apps/web/tsconfig.app.json --noEmit` — 0 errors (no web changes; should pass)
|
|
- [ ] `pnpm -C apps/web build` — green
|
|
- [ ] `pnpm -C apps/server test` — all green
|
|
|
|
## B9 — Docs + tag + deploy
|
|
|
|
- [ ] `CHANGELOG.md` entry for v1.14.0-outer-loop
|
|
- [ ] `boocode_roadmap.md` retrospective bullet on the v1.14 section
|
|
- [ ] `CLAUDE.md` updates: mention the outer loop, MAX_STEPS, agent.steps in the inference/ section
|
|
- [ ] Commit, tag `v1.14.0-outer-loop`, push, rebuild
|