Files
boocode/openspec/changes/archived/orchestrator/artifacts/implementation-iteration-history.md
indifferentketchup a734615480 docs: archive shipped openspec changes, refresh roadmap + DEFERRED-WORK
Move openspec/changes/{contracts-ssot,orchestrator} → archived/ (both shipped,
v2.7.13 and v2.7.17). Mark the roadmap's "Write/edit robustness" and "Claude
provider SDK" milestones as shipped (fuzzy-match.ts + checkpoints.ts; the
claude-sdk backend is live via CLAUDE_SDK_BACKEND in .env.host) and add a
v2.7.12–v2.7.17 shipped summary. Flag DEFERRED-WORK.md as superseded.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-03 16:30:01 +00:00

9.9 KiB
Raw Blame History

Implementation Iteration History — Orchestrator (Phase 2)

Source spec: ../proposal.md · design context: design-context.md · discovery: .discovery-notes.md · decision log: implementation-decision-log.md

R1 — Parallel specialist review

Specialists engaged: software-architect, data-engineer, user-experience-designer, junior-developer (all sonnet).

New input: the settled design (12 decisions) + discovery notes. Each produced concrete HOW recommendations.

Claim ledger

# Claim State Spec-maturity Supporting
C1 Re-home pure flow defs (spine/flows/contracts/types/render) into apps/coder/src/conductor/; sever code-review.tsdispatch.ts coupling by injecting a DispatchFn via StepContext. Keep Phase-1 CLI alive. Evidenced (code-review.ts:10,62; conductor/tsconfig.json:6) plan-level architect
C2 DB-driven scheduler apps/coder/src/services/flow-runner.ts + flow_runs/flow_steps; fan-out via an onTaskTerminal callback on the existing dispatcher (no 3rd poll loop). Evidenced (dispatcher.ts:46-179,279-286) plan-level architect
C3 Full step output must persist in flow_steps.output TEXTtasks.output_summary is ≤500 char and can't reconstruct ctx.results for render/resume. Evidenced (schema.sql:26, flow.ts:49,59) plan-level data-engineer, architect
C4 FK direction = flow_steps.task_id → tasks(id) ON DELETE SET NULL (nullable; code steps NULL). Do NOT add a column to tasks. Evidenced (schema.sql:18-34) plan-level data-engineer, architect
C5 flow_runs.project_id no FK (matches tasks.project_id); CHECK-named status constraints; CHECK (input ? 'question'). Evidenced (schema.sql:19,32) plan-level data-engineer
C6 Omit depends_on column (deps derivable from loaded flow def) and skipped-step rows (when() is pure on stored input). Evidenced (flow.ts:28-41, types.ts:27) — YAGNI plan-level data-engineer
C7 Two new WS frames: flow_run_started (step manifest + per-step chat_id) + flow_run_step_updated (status + final report). Content stream REUSES existing delta/tool_call/message_complete by chat_id. Per-step synthetic chat row. Evidenced (broker pipeline; ws-frames.ts) plan-level architect
C8 Orchestrator pane: collapsed roster, expand-one-at-a-time detail well, reuse AgentStatusDot; report at top on completion. Mobile single-column inline expand. Evidenced (AgentComposerBar.tsx:204, CoderPane) plan-level UX
C9 Toolbar fits: actual ChatInput row ≤5 elements; add Workflow icon between SquareSlash and Globe; "Flows" label desktop, icon-only mobile. Resolves junior Q13. Evidenced (ChatInput.tsx:648-732,673) plan-level UX (refutes junior Q13 worry)
C10 Launcher: 5 category tabs (Analysis/Discovery/Planning/Authoring/Review) + filtered flow list + size + focus + fast; defaults Analysis/Small/off. Runs history in NewPaneMenu; export in pane header . Evidenced (flows/index.ts, NewPaneMenu.tsx, lib/events.ts) plan-level UX
C11 Contracts (evidence/yagni) still injected by calling step.run(ctx) in-process in flow-runner before INSERT — closures execute in the coder process; prompts are NOT serialized to DB. Resolves junior Q12. Evidenced (spine.ts:73) plan-level architect (confirms junior Q12)
C12 Dispatch-mechanism tension: A2 says insert pending task → dispatcher picks it up via LISTEN (reuses streaming+worktree). A4 says insert running task + call dispatchViaPty DIRECTLY to avoid worktree creation (decision 5). The two contradict; architect resolved toward A4 (bypass). Disputed (internal to architect: A2 vs A4) plan-level architect (self-flagged)
C13 Read-only enforcement is prompt-level + BOOCODE_TOOLS=core (if the binary honors it) + project-dir-as-cwd (no worktree). Architect + junior both say adversarial-security-analyst must verify it's watertight before decision 5 is safe. Anecdotal (enforcement not proven) plan-level architect (A4), junior (Q8/Q9)
C14 spine.ts renders process.env.CONDUCTOR_MODEL into the report header (spine.ts:122) — must be parameterized to the run's model on re-home. Personas (conductor/agents/*.md) copied into apps/coder. Evidenced (spine.ts:122, dispatch.ts:15) plan-level junior Q5/Q6
C15 Resume semantics underspecified: re-dispatch in-flight steps vs mark-run-failed. With A4 direct-PTY, a restart kills the PTY child → in-flight steps MUST re-dispatch. Decision 4 commits to "resumable." Disputed (architect reconcile-and-advance vs data mark-failed) plan-level architect (A3), data (OQ1), junior (Q3)
C16 queued (decision 11) is hard to observe: with direct PTY the task is running and blocked on the busy model; llama-swap doesn't report queue position. May reduce to pending(dep-wait)/running. Anecdotal plan-level (spec-vs-reality) junior Q11, data DATA-005

Spec-maturity gate

No T# notes exist (gate reduces to spec-level threshold). spec-level findings = 0 by ≥3 specialists: junior's Q15 nominated three "open items" as spec-level, but the architect + data-engineer resolved them in-plan (re-home → copy into apps/coder/src/conductor; step→task → 1:1 flow_steps.task_id; resume → decision 4 already commits to "resumable", direction settled below). Gate does NOT trip. Proceed in-plan.

Open Questions

  • OQ1 (C12): dispatch mechanism — reuse the dispatcher with a no-worktree branch, vs bypass via direct dispatchViaPty. → user escalation (recommendation below).
  • OQ2 (C13): is read-only watertight? → R2: adversarial-security-analyst.
  • OQ3 (C15): resume = re-dispatch in-flight steps on restart (recommended; decision 4 = resumable). → user confirm.
  • OQ4 (C16): keep an explicit queued status or reduce to pending/running. → user confirm (minor).

Next-step recommendation

continue iteratingR2: one targeted specialist (adversarial-security-analyst) on read-only enforcement + the no-worktree safety question (OQ2), since it gates the safety of decision 5. Then a single batched user escalation (OQ1, OQ3, OQ4) and synthesis.

Decisions produced: D-1 (from C1, C11, C14), D-2 (C2, C3), D-5 (C3C6), D-6 (C7), D-7 (C8), D-8 (C9, C10), D-9 (C15), D-10 (C16). Partially produced (resolved in R2 + user escalation): D-3 (C12), D-4 (C13). Changed in plan: C12's A4-leaning bypass was REVERSED — the user chose dispatcher reuse (D-3), and R2's read-only finding made the worktree A4 wanted to avoid harmless. C13's "prompt + BOOCODE_TOOLS" enforcement was REPLACED by mode_id='plan' (D-4). C16's queued status was DROPPED (D-10).

R2 — Targeted security review (read-only enforcement)

Specialist engaged: adversarial-security-analyst (opus). Charter: OQ2 only — is read-only watertight for flow steps, and is the no-worktree posture (decision 5) safe?

New input: R1's C12/C13 (the dispatch-mechanism tension and the unproven prompt-level enforcement claim), plus the qwen PTY dispatch path.

Claim ledger

# Claim State Supporting
C13-R The R1 enforcement story (persona prompts + BOOCODE_TOOLS=core + project-dir-as-cwd) is NOT watertight for an external qwen CLI child: persona text is instruction not a gate; BOOCODE_TOOLS governs BooChat's in-process tool registry, not the external binary's own tools. Read-only must be enforced at the agent's own tool layer. Evidenced (pty-dispatch.ts:72-77 — the qwen spawn spec; BOOCODE_TOOLS scope is BooChat-only per CLAUDE.md env section) adversarial-security-analyst
C13-FIX qwen's --approval-mode plan IS the binding control: a built-in tool-level gate (reads allowed, writes blocked) inside the agent binary. Already wired — mode_id--approval-mode at pty-dispatch.ts:75. Dispatch every step with mode_id='plan', never user-overridable. Qwen-only (decision 6) makes one hardcoded flag sufficient. Evidenced (pty-dispatch.ts:75) adversarial-security-analyst
C12-R Because plan mode is a tool-level write-block, the worktree the dispatcher creates is INERT — the agent cannot write to it. The "no worktree" motivation behind A4 (decision 5) dissolves: keep the worktree as a harmless read snapshot and REUSE the dispatcher (A2) rather than bypass it (A4). Resolves the C12 tension toward A2. Evidenced (worktree = HEAD checkout; plan mode blocks writes to it) adversarial-security-analyst (settles C12)

Resolution

  • OQ2 → resolved. Read-only is watertight via mode_id='plan' (qwen --approval-mode plan), NOT prompt/BOOCODE_TOOLS. C13 moves Anecdotal → Evidenced (refuted-and-replaced).
  • OQ1 (C12) → unblocked. The security finding removes A4's only advantage; the user chose A2 (dispatcher reuse). Decision-context decision 5 ("no worktree") is REVISED to "worktree as a harmless read snapshot."
  • A new non-qwen agent in flows would require re-verifying its plan-mode equivalent before allowing it (recorded as the D-4 revisit criterion).

User escalation (batched, post-R2)

  • OQ1 → reuse the dispatcher (A2), one new onTaskTerminal hook, no PTY bypass.
  • OQ3 → reconcile-and-advance resume (re-dispatch lost/failed steps; keep completed).
  • OQ4 → drop queued; pending covers waiting.

Next-step recommendation

synthesize — all blocking open questions resolved; no spec-maturity gate trip.

Decisions produced: D-3 (from C12-R / OQ1 user choice), D-4 (from C13-R / C13-FIX). Co-produced with R1: confirms D-9 (OQ3) and D-10 (OQ4). Changed in plan: decision-context decision 5 REVISED (no-worktree → read-snapshot worktree, read-only via plan mode); C13's enforcement mechanism REPLACED (prompt/BOOCODE_TOOLS → mode_id='plan'); C12 RESOLVED toward A2 (reuse) over A4 (bypass).