docs: archive shipped openspec changes, refresh roadmap + DEFERRED-WORK

Move openspec/changes/{contracts-ssot,orchestrator} → archived/ (both shipped,
v2.7.13 and v2.7.17). Mark the roadmap's "Write/edit robustness" and "Claude
provider SDK" milestones as shipped (fuzzy-match.ts + checkpoints.ts; the
claude-sdk backend is live via CLAUDE_SDK_BACKEND in .env.host) and add a
v2.7.12–v2.7.17 shipped summary. Flag DEFERRED-WORK.md as superseded.
This commit is contained in:
2026-06-03 16:30:01 +00:00
parent 4f3bd1e06a
commit 67852b076b
11 changed files with 9 additions and 5 deletions

View File

@@ -0,0 +1,89 @@
# Implementation Iteration History — Orchestrator (Phase 2)
Source spec: [../proposal.md](../proposal.md) · design context:
[design-context.md](design-context.md) · discovery: [.discovery-notes.md](.discovery-notes.md)
· decision log: [implementation-decision-log.md](implementation-decision-log.md)
## R1 — Parallel specialist review
**Specialists engaged:** software-architect, data-engineer, user-experience-designer, junior-developer (all sonnet).
**New input:** the settled design (12 decisions) + discovery notes. Each produced concrete HOW recommendations.
### Claim ledger
| # | Claim | State | Spec-maturity | Supporting |
|---|---|---|---|---|
| C1 | Re-home pure flow defs (`spine/flows/contracts/types/render`) into `apps/coder/src/conductor/`; sever `code-review.ts``dispatch.ts` coupling by injecting a `DispatchFn` via `StepContext`. Keep Phase-1 CLI alive. | Evidenced (`code-review.ts:10,62`; `conductor/tsconfig.json:6`) | plan-level | architect |
| C2 | DB-driven scheduler `apps/coder/src/services/flow-runner.ts` + `flow_runs`/`flow_steps`; fan-out via an `onTaskTerminal` callback on the existing dispatcher (no 3rd poll loop). | Evidenced (`dispatcher.ts:46-179,279-286`) | plan-level | architect |
| C3 | Full step output must persist in `flow_steps.output TEXT``tasks.output_summary` is ≤500 char and can't reconstruct `ctx.results` for render/resume. | Evidenced (`schema.sql:26`, `flow.ts:49,59`) | plan-level | data-engineer, architect |
| C4 | FK direction = `flow_steps.task_id → tasks(id) ON DELETE SET NULL` (nullable; code steps NULL). Do NOT add a column to `tasks`. | Evidenced (`schema.sql:18-34`) | plan-level | data-engineer, architect |
| C5 | `flow_runs.project_id` no FK (matches `tasks.project_id`); CHECK-named status constraints; `CHECK (input ? 'question')`. | Evidenced (`schema.sql:19,32`) | plan-level | data-engineer |
| C6 | Omit `depends_on` column (deps derivable from loaded flow def) and skipped-step rows (`when()` is pure on stored `input`). | Evidenced (`flow.ts:28-41`, `types.ts:27`) — **YAGNI** | plan-level | data-engineer |
| C7 | Two new WS frames: `flow_run_started` (step manifest + per-step `chat_id`) + `flow_run_step_updated` (status + final report). Content stream REUSES existing `delta/tool_call/message_complete` by `chat_id`. Per-step synthetic chat row. | Evidenced (broker pipeline; `ws-frames.ts`) | plan-level | architect |
| C8 | Orchestrator pane: collapsed roster, expand-one-at-a-time detail well, reuse `AgentStatusDot`; report at top on completion. Mobile single-column inline expand. | Evidenced (`AgentComposerBar.tsx:204`, `CoderPane`) | plan-level | UX |
| C9 | Toolbar fits: actual `ChatInput` row ≤5 elements; add `Workflow` icon between SquareSlash and Globe; "Flows" label desktop, icon-only mobile. **Resolves junior Q13.** | Evidenced (`ChatInput.tsx:648-732,673`) | plan-level | UX (refutes junior Q13 worry) |
| C10 | Launcher: 5 category tabs (Analysis/Discovery/Planning/Authoring/Review) + filtered flow list + size + focus + fast; defaults Analysis/Small/off. Runs history in NewPaneMenu; export in pane header `…`. | Evidenced (`flows/index.ts`, `NewPaneMenu.tsx`, `lib/events.ts`) | plan-level | UX |
| C11 | Contracts (evidence/yagni) still injected by calling `step.run(ctx)` in-process in flow-runner before INSERT — closures execute in the coder process; prompts are NOT serialized to DB. **Resolves junior Q12.** | Evidenced (`spine.ts:73`) | plan-level | architect (confirms junior Q12) |
| **C12** | **Dispatch-mechanism tension:** A2 says insert *pending* task → dispatcher picks it up via LISTEN (reuses streaming+worktree). A4 says insert *running* task + call `dispatchViaPty` DIRECTLY to avoid worktree creation (decision 5). The two contradict; architect resolved toward A4 (bypass). | **Disputed (internal to architect: A2 vs A4)** | plan-level | architect (self-flagged) |
| **C13** | **Read-only enforcement is prompt-level + `BOOCODE_TOOLS=core` (if the binary honors it)** + project-dir-as-cwd (no worktree). Architect + junior both say adversarial-security-analyst must verify it's watertight before decision 5 is safe. | **Anecdotal (enforcement not proven)** | plan-level | architect (A4), junior (Q8/Q9) |
| C14 | `spine.ts` renders `process.env.CONDUCTOR_MODEL` into the report header (`spine.ts:122`) — must be parameterized to the run's model on re-home. Personas (`conductor/agents/*.md`) copied into `apps/coder`. | Evidenced (`spine.ts:122`, `dispatch.ts:15`) | plan-level | junior Q5/Q6 |
| C15 | Resume semantics underspecified: re-dispatch in-flight steps vs mark-run-failed. With A4 direct-PTY, a restart kills the PTY child → in-flight steps MUST re-dispatch. Decision 4 commits to "resumable." | Disputed (architect reconcile-and-advance vs data mark-failed) | plan-level | architect (A3), data (OQ1), junior (Q3) |
| C16 | `queued` (decision 11) is hard to observe: with direct PTY the task is `running` and blocked on the busy model; llama-swap doesn't report queue position. May reduce to pending(dep-wait)/running. | Anecdotal | plan-level (spec-vs-reality) | junior Q11, data DATA-005 |
### Spec-maturity gate
No `T#` notes exist (gate reduces to spec-level threshold). spec-level findings = 0 by ≥3 specialists: junior's Q15 nominated three "open items" as spec-level, but the architect + data-engineer **resolved** them in-plan (re-home → copy into `apps/coder/src/conductor`; step→task → 1:1 `flow_steps.task_id`; resume → decision 4 already commits to "resumable", direction settled below). **Gate does NOT trip.** Proceed in-plan.
### Open Questions
- **OQ1 (C12):** dispatch mechanism — reuse the dispatcher with a no-worktree branch, vs bypass via direct `dispatchViaPty`. → user escalation (recommendation below).
- **OQ2 (C13):** is read-only watertight? → **R2: adversarial-security-analyst.**
- **OQ3 (C15):** resume = re-dispatch in-flight steps on restart (recommended; decision 4 = resumable). → user confirm.
- **OQ4 (C16):** keep an explicit `queued` status or reduce to pending/running. → user confirm (minor).
### Next-step recommendation
`continue iterating`**R2**: one targeted specialist (adversarial-security-analyst) on read-only enforcement + the no-worktree safety question (OQ2), since it gates the safety of decision 5. Then a single batched user escalation (OQ1, OQ3, OQ4) and synthesis.
_Decisions produced: D-1 (from C1, C11, C14), D-2 (C2, C3), D-5 (C3C6), D-6 (C7), D-7 (C8), D-8 (C9, C10), D-9 (C15), D-10 (C16). Partially produced (resolved in R2 + user escalation): D-3 (C12), D-4 (C13)._
_Changed in plan: C12's A4-leaning bypass was REVERSED — the user chose dispatcher reuse (D-3), and R2's read-only finding made the worktree A4 wanted to avoid harmless. C13's "prompt + BOOCODE_TOOLS" enforcement was REPLACED by `mode_id='plan'` (D-4). C16's `queued` status was DROPPED (D-10)._
## R2 — Targeted security review (read-only enforcement)
**Specialist engaged:** adversarial-security-analyst (opus). **Charter:** OQ2 only — is read-only watertight for flow steps, and is the no-worktree posture (decision 5) safe?
**New input:** R1's C12/C13 (the dispatch-mechanism tension and the unproven prompt-level enforcement claim), plus the qwen PTY dispatch path.
### Claim ledger
| # | Claim | State | Supporting |
|---|---|---|---|
| C13-R | The R1 enforcement story (persona prompts + `BOOCODE_TOOLS=core` + project-dir-as-cwd) is NOT watertight for an external `qwen` CLI child: persona text is instruction not a gate; `BOOCODE_TOOLS` governs BooChat's in-process tool registry, not the external binary's own tools. Read-only must be enforced at the agent's own tool layer. | **Evidenced** (`pty-dispatch.ts:72-77` — the qwen spawn spec; `BOOCODE_TOOLS` scope is BooChat-only per CLAUDE.md env section) | adversarial-security-analyst |
| C13-FIX | qwen's `--approval-mode plan` IS the binding control: a built-in tool-level gate (reads allowed, writes blocked) inside the agent binary. Already wired — `mode_id``--approval-mode` at `pty-dispatch.ts:75`. Dispatch every step with `mode_id='plan'`, never user-overridable. Qwen-only (decision 6) makes one hardcoded flag sufficient. | **Evidenced** (`pty-dispatch.ts:75`) | adversarial-security-analyst |
| C12-R | Because plan mode is a tool-level write-block, the worktree the dispatcher creates is INERT — the agent cannot write to it. The "no worktree" motivation behind A4 (decision 5) dissolves: keep the worktree as a harmless read snapshot and REUSE the dispatcher (A2) rather than bypass it (A4). Resolves the C12 tension toward A2. | **Evidenced** (worktree = HEAD checkout; plan mode blocks writes to it) | adversarial-security-analyst (settles C12) |
### Resolution
- **OQ2 → resolved.** Read-only is watertight via `mode_id='plan'` (qwen
`--approval-mode plan`), NOT prompt/`BOOCODE_TOOLS`. C13 moves Anecdotal →
Evidenced (refuted-and-replaced).
- **OQ1 (C12) → unblocked.** The security finding removes A4's only advantage; the
user chose A2 (dispatcher reuse). Decision-context decision 5 ("no worktree") is
REVISED to "worktree as a harmless read snapshot."
- A new non-qwen agent in flows would require re-verifying its plan-mode equivalent
before allowing it (recorded as the D-4 revisit criterion).
### User escalation (batched, post-R2)
- OQ1 → **reuse the dispatcher** (A2), one new `onTaskTerminal` hook, no PTY bypass.
- OQ3 → **reconcile-and-advance** resume (re-dispatch lost/failed steps; keep
completed).
- OQ4 → **drop `queued`**; `pending` covers waiting.
### Next-step recommendation
`synthesize` — all blocking open questions resolved; no spec-maturity gate trip.
_Decisions produced: D-3 (from C12-R / OQ1 user choice), D-4 (from C13-R / C13-FIX). Co-produced with R1: confirms D-9 (OQ3) and D-10 (OQ4)._
_Changed in plan: decision-context decision 5 REVISED (no-worktree → read-snapshot worktree, read-only via plan mode); C13's enforcement mechanism REPLACED (prompt/BOOCODE_TOOLS → `mode_id='plan'`); C12 RESOLVED toward A2 (reuse) over A4 (bypass)._