docs: archive shipped openspec changes, refresh roadmap + DEFERRED-WORK
Move openspec/changes/{contracts-ssot,orchestrator} → archived/ (both shipped,
v2.7.13 and v2.7.17). Mark the roadmap's "Write/edit robustness" and "Claude
provider SDK" milestones as shipped (fuzzy-match.ts + checkpoints.ts; the
claude-sdk backend is live via CLAUDE_SDK_BACKEND in .env.host) and add a
v2.7.12–v2.7.17 shipped summary. Flag DEFERRED-WORK.md as superseded.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,116 @@
|
||||
# Discovery notes — Orchestrator (Phase 2)
|
||||
|
||||
Single source of truth for project context. Specialists: read this first; do not
|
||||
re-grep for what's here. Search further only for what your domain needs that's
|
||||
missing.
|
||||
|
||||
## Tech stack
|
||||
|
||||
- pnpm monorepo. `apps/server` (BooChat: Fastify + Postgres), `apps/coder`
|
||||
(BooCoder: host systemd service, agent dispatch), `apps/web` (React + Vite),
|
||||
`apps/booterm`, `packages/contracts` (`@boocode/contracts`, cross-app wire SSOT).
|
||||
- DB `boochat` (Postgres 16). TypeScript strict, NodeNext (server/coder), `.js`
|
||||
import extensions. Tests: vitest (server + coder); **no web test harness**.
|
||||
- Deploy: `apps/coder` change → `sudo systemctl restart boocoder`; `apps/web`/
|
||||
`apps/server` → `docker compose up --build -d boocode`.
|
||||
|
||||
## Phase 1 assets to reuse (`/opt/boocode/conductor/`)
|
||||
|
||||
- `src/spine.ts` — the Spine→Flow factory (band gating, fold, synthesizer,
|
||||
validator, render), + contracts injection.
|
||||
- `src/flows/*.ts` + `flows/index.ts` — 22 flows (21 Spine configs + bespoke
|
||||
`code-review.ts`), registry (`getFlow`, `FLOW_NAMES`, `describeFlows`).
|
||||
- `src/contracts.ts` — Han evidence/yagni contracts (produce/review).
|
||||
- `src/types.ts` — `Flow`, `Step`, `Spine`, `Angle`, `Band`, `Contract`.
|
||||
- `src/flow.ts` — the wave scheduler (dep-aware parallel). **This is what Phase 2
|
||||
must re-home/replace** so steps dispatch through BooCoder backends + persist.
|
||||
- `src/dispatch.ts` — current `opencode run` subprocess dispatch. **Replaced in
|
||||
Phase 2** by BooCoder backend dispatch.
|
||||
- `agents/*.md` — 23 Han personas (also live in `~/.config/opencode/agents/`).
|
||||
|
||||
## apps/coder — execution surfaces
|
||||
|
||||
- `src/services/dispatcher.ts:46` — `createDispatcher`. `LISTEN 'tasks_new'` fast
|
||||
path (pg trigger `notify_tasks_new`, schema line 279) + 2s poll. `runTask`
|
||||
routes a `state='pending'` task to a backend. `inflight` map keyed
|
||||
`session_id ?? 'task:<id>'` serializes per session.
|
||||
- `src/services/agent-backend.ts:97` — `AgentBackend` (ensureSession / prompt /
|
||||
closeSession / dispose / health). Backends: `backends/opencode-server.ts`,
|
||||
`warm-acp.ts`, `claude-sdk.ts`; one-shot `acp-dispatch.ts` / `pty-dispatch.ts`.
|
||||
- `AgentEvent` (agent-backend.ts:28, union text|reasoning|tool_call|tool_update|
|
||||
commands) → mapped to WS frames by the dispatcher → `broker.publishUserFrame`.
|
||||
- **Tasks are how work is dispatched.** `INSERT INTO tasks (project_id, input,
|
||||
agent, model, mode_id, thinking_option_id, session_id, chat_id)` then the
|
||||
LISTEN/NOTIFY trigger picks it up. Precedents: `routes/messages.ts:233`,
|
||||
**`routes/skills.ts:94` (a skill IS already dispatched as a task)**,
|
||||
`routes/arena.ts:49`, `tools/new_task.ts:54` (writes `parent_task_id`).
|
||||
|
||||
## apps/coder — schema (`src/schema.sql`, coder-owned)
|
||||
|
||||
- `tasks` (line 18): `id, project_id, parent_task_id (FK self, written by new_task,
|
||||
NOT read by dispatcher), state CHECK(pending|running|completed|failed|blocked|
|
||||
cancelled), input, output_summary (≤500 char), agent, model, execution_path,
|
||||
cost_tokens, started_at, ended_at, session_id, arena_id, mode_id,
|
||||
thinking_option_id, chat_id`.
|
||||
- `agent_sessions` (line 88): PK `(chat_id, agent)`; `backend, agent_session_id,
|
||||
server_port, status(idle|active|crashed|closed), config_hash, token/cost cols`.
|
||||
- `worktrees` (line 142), `available_agents` (line 36), `checkpoints` (233),
|
||||
`claude_session_entries` (252). `notify_tasks_new` trigger (279).
|
||||
- **Schema discipline (root CLAUDE.md):** two schema files one DB; coder schema is
|
||||
applied by the host boocoder service. CHECK migrations: DROP IF EXISTS the
|
||||
system-named constraint → UPDATE → guarded ADD. `CREATE OR REPLACE VIEW` can't
|
||||
reorder cols. JSONB via `sql.json(value as never)`. `clock_timestamp()` in txns.
|
||||
|
||||
## packages/contracts — WS frames
|
||||
|
||||
- `src/ws-frames.ts` — Zod frames in `WsFrameSchema` (SSOT). Existing: snapshot,
|
||||
message_started, delta, reasoning_delta, tool_call, tool_result,
|
||||
message_complete, usage, messages_deleted, chat_renamed, compacted, error.
|
||||
- **Adding a frame (cross-app, root CLAUDE.md):** add to `WsFrameSchema` here
|
||||
(rebuild `pnpm -C packages/contracts build`), AND the server's loose
|
||||
`InferenceFrame` union (`services/inference/turn.ts`), AND the web's strict
|
||||
`WsFrame` union (`apps/web/src/api/types.ts`) — the web type is the wire gate;
|
||||
missing it silently drops the frame at JSON-parse.
|
||||
|
||||
## apps/web — panes + composer
|
||||
|
||||
- Pane kinds (`api/types.ts:386` `WorkspacePaneKind`): `empty | chat | coder |
|
||||
terminal | settings | markdown_artifact | html_artifact`. **Extra non-chat pane
|
||||
kinds are already precedented** — adding `orchestrator` follows
|
||||
`markdown_artifact`/`html_artifact`.
|
||||
- `hooks/useWorkspacePanes.ts` — pane state, `addSplitPane(kind)`, server-persisted
|
||||
(+ legacy localStorage seed). `Workspace.tsx`, `NewPaneMenu.tsx`,
|
||||
`ChatTabBar.tsx`, `PaneHeaderActions.tsx` all take `kind: 'chat'|'terminal'|
|
||||
'coder'` — adding a kind touches these.
|
||||
- **`ChatInput.tsx` is the shared composer** rendered by BOTH `ChatPane.tsx` and
|
||||
`CoderPane.tsx` (CoderPane also stacks `AgentComposerBar` above it). Its toolbar
|
||||
row (icons: Globe, ListPlus, Paperclip, Send/Stop, `SquareSlash` for slash)
|
||||
is where the Orchestrator button goes → parity for free. It takes `slashGroups`
|
||||
(ChatPane passes BooChat skills; CoderPane passes agent-commands+skills),
|
||||
`onSlashCommand`. `SlashCommandPicker.tsx`, `hooks/useSkills.ts`.
|
||||
- Mobile: per prior preference, crowded toolbars must fit one line (no scroll/wrap)
|
||||
and the new button shows icon-only on mobile.
|
||||
|
||||
## Precedents / related
|
||||
|
||||
- **Arena** (`routes/arena.ts`): same task → N contestants (tasks sharing
|
||||
`arena_id`), parallel, `[SELECTED]` winner. Closest existing fan-out; stays
|
||||
separate but is a structural precedent for "one launch → many tasks grouped".
|
||||
- **BooChat skills** (`apps/server` `routes/skills.ts` + `services/skills`,
|
||||
`getSkillBody`): slash injects a skill body, the single chat model runs it
|
||||
inline. The coder also has `routes/skills.ts` that dispatches a skill as a task.
|
||||
- Event-dedup discipline (root CLAUDE.md): a mutation published via
|
||||
`broker.publishUser` must NOT also `sessionEvents.emit` locally; handlers
|
||||
idempotent.
|
||||
|
||||
## Enumerated gaps (searched, not found)
|
||||
|
||||
- No `flow_runs` / `flow_steps` / `flows` tables, no `depends_on`/`step_index` on
|
||||
`tasks`, no DAG/pipeline concept anywhere (confirmed Phase-1 research).
|
||||
- No `orchestrator` pane kind, component, or WS frame yet.
|
||||
- No coding-standards dir hits for orchestration; ADR dir not present under
|
||||
`docs/adr/` (none found) — architectural decisions live in `openspec/changes/`.
|
||||
- No resume mechanism for a multi-step run after coder restart (single tasks
|
||||
resume via `agent_sessions`; a *run* spanning tasks does not).
|
||||
- The conductor's scheduler (`conductor/src/flow.ts`) is in-process/in-memory;
|
||||
it does not persist step state or survive restart.
|
||||
@@ -0,0 +1,87 @@
|
||||
# Orchestrator (Phase 2) — settled design (source spec)
|
||||
|
||||
This is the behavioral specification for the BooCode Orchestrator, settled via a
|
||||
`grill-me` interview. It is the ground truth for *what*; the implementation plan
|
||||
covers *how*. Phase 1 (the standalone code conductor at `/opt/boocode/conductor/`)
|
||||
is done; this is the in-app integration.
|
||||
|
||||
## Outcome
|
||||
|
||||
Bring the deterministic multi-agent conductor into the BooCode app: a user can
|
||||
launch any Han flow from BooChat or BooCoder, watch each agent's progress live
|
||||
(Paseo-style), and get an evidence-disciplined report — all on local Qwen, free.
|
||||
|
||||
## Settled decisions (immutable for this plan)
|
||||
|
||||
1. **One engine.** The conductor is the only engine. Every read-only Han skill IS
|
||||
a conductor flow. No single-agent degraded path.
|
||||
2. **Two doors, full parity.** A slash command and an "Orchestrator" button, both
|
||||
on the shared `ChatInput` composer — so both appear in BooChat (ChatPane) and
|
||||
BooCoder (CoderPane), desktop and mobile (mobile = icon only). Launching from
|
||||
either opens the same run view.
|
||||
3. **Run view = new pane kind.** A fourth pane kind, `orchestrator`, alongside
|
||||
`chat | terminal | coder`, opened as a new pane in the current session. It
|
||||
renders: flow + band at top, a list of agents with live status, each
|
||||
expandable to watch its stream; the final report at the top when done. Shape
|
||||
is parent-with-nested-children (Paseo's parent/subagents), not one-agent.
|
||||
4. **Execution through BooCoder backends.** Each flow agent is a real BooCoder
|
||||
agent session dispatched through the existing `AgentBackend`s → live streaming
|
||||
via the existing AgentEvent→WS-frame pipeline, persisted to Postgres,
|
||||
resumable. New `flow_runs` + `flow_steps` tables in the coder schema; the
|
||||
conductor's scheduler fires each step's task as its deps complete.
|
||||
5. **Read-only, no worktree.** Flows never write to the repo. Agents read the
|
||||
project's working directory directly; no git worktree is created. Read-only is
|
||||
enforced at dispatch (agents get no edit/write tools). The report is the only
|
||||
output.
|
||||
6. **Qwen-only.** Default the loaded 35B (`llama-swap/qwen3.6-35b-a3b-mxfp4`),
|
||||
held as a single config value so additional local models slot in later with
|
||||
near-zero rework. No Claude path — Claude Code remains the Claude lane.
|
||||
7. **Naming.** "Orchestrator." The existing Arena (same-task-on-N-models,
|
||||
`apps/coder/src/routes/arena.ts`) stays a separate feature.
|
||||
8. **Skill placement.** The slash menu surfaces the read-only analysis/review set
|
||||
(research, investigate, code-review, architectural-analysis, security-review,
|
||||
gap-analysis, data-review, devops-review, issue-triage, project-discovery,
|
||||
test-planning). The Orchestrator button exposes the FULL catalog (22 flows),
|
||||
including the planning/authoring draft flows.
|
||||
9. **Launch UX.** Slash launches instantly with defaults (band = small, target =
|
||||
the current pane's project, text after the command = the question/focus),
|
||||
opening an Orchestrator pane. The button opens a launcher first (pick flow,
|
||||
size, target/focus) then launches. Same run view either way.
|
||||
10. **Report output.** Stored with the run in Postgres, shown at the top of the
|
||||
Orchestrator pane. Runs persist and are reopenable from a runs history.
|
||||
Export on demand: copy / save-to-file / send-to-chat. Nothing auto-written to
|
||||
the repo.
|
||||
11. **Concurrency.** Multiple runs allowed; each its own pane. They share the one
|
||||
local model, so workers queue at llama-swap — panes show `queued` honestly.
|
||||
12. **Sizing modes.** Han's bands (small/medium/large) select roster breadth per
|
||||
flow; a fast mode caps each worker's depth. Both carry over from Phase 1.
|
||||
|
||||
## Evidence & rule alignment (carried from Phase 1)
|
||||
|
||||
Flows apply Han's `evidence-rule` (trust classes, web corroboration gate,
|
||||
no-evidence labeling) and `yagni-rule` (producing flows only), injected as
|
||||
contracts. The adversarial-validator gate runs the review checklists and emits a
|
||||
plain-language Summary + Confidence. This is already built in the conductor
|
||||
(`conductor/src/contracts.ts`); Phase 2 must preserve it when porting dispatch
|
||||
from `opencode run` subprocess to the BooCoder backends.
|
||||
|
||||
## Out of scope (Phase 2)
|
||||
|
||||
- The exact pixel-faithful per-skill Han report templates (spine-level only, by
|
||||
prior decision).
|
||||
- A Claude execution path (Claude Code covers it).
|
||||
- Folding the existing Arena into the Orchestrator (stays separate).
|
||||
- Per-agent model tiering (single model per run for now; revisit when more local
|
||||
models exist).
|
||||
|
||||
## Open items the plan must resolve (the HOW)
|
||||
|
||||
- How the conductor's flow/spine definitions (Phase 1, `conductor/src/`) are
|
||||
reused vs. re-homed inside `apps/coder` when dispatch moves to the backends.
|
||||
- The `flow_runs`/`flow_steps` schema shape, status lifecycle, and how a step maps
|
||||
to a `tasks`/`agent_sessions` row.
|
||||
- How the scheduler resumes a run after a coder-service restart (mid-flight steps).
|
||||
- The new `orchestrator` WS frame(s) and how the pane subscribes to per-agent
|
||||
streams (reusing the existing broker/AgentEvent pipeline).
|
||||
- The Orchestrator pane component structure and how it nests N live agent streams
|
||||
without the crowding the grill rejected.
|
||||
@@ -0,0 +1,478 @@
|
||||
# Implementation Decision Log — Orchestrator (Phase 2)
|
||||
|
||||
Han synthesis output. Each decision is committed: it cites evidence, records
|
||||
rejected alternatives, names an owner, and a revisit criterion. Cross-reference
|
||||
invariant: every `D-N` here is referenced by [design.md](../design.md) and/or
|
||||
[tasks.md](../tasks.md), and produced by a round recorded in
|
||||
[implementation-iteration-history.md](implementation-iteration-history.md).
|
||||
|
||||
Source: a conversational `grill-me` design session. The settled behavioral spec
|
||||
is captured in [design-context.md](design-context.md) (12 decisions; decision 5
|
||||
is REVISED by D-3 / D-4 below). Specialist findings are in the claim ledger
|
||||
C1–C16 of the iteration history.
|
||||
|
||||
Trust class of evidence below: **codebase** (file:line in this repo) unless
|
||||
noted. No single-source web claims underpin any committed decision.
|
||||
|
||||
---
|
||||
|
||||
## D-1 — Re-home the pure conductor definitions into `apps/coder/src/conductor/`
|
||||
|
||||
**Decision.** Copy the pure (dispatch-free) conductor definition files —
|
||||
`spine.ts`, `flows/*`, `contracts.ts`, `types.ts`, `render.ts` — into
|
||||
`apps/coder/src/conductor/`, plus the 23 Han personas (`conductor/agents/*.md`).
|
||||
The Phase-1 standalone CLI (`conductor/`) stays alive and unchanged. Sever the
|
||||
`flows/code-review.ts` → `dispatch.ts` coupling by adding a `DispatchFn` to
|
||||
`StepContext`, injected by the flow-runner. Parameterize `spine.ts`'s model from
|
||||
`process.env.CONDUCTOR_MODEL` to the run's configured model.
|
||||
|
||||
**Rationale.** The flow definitions are pure data + closures; only `dispatch.ts`
|
||||
(the `opencode run` subprocess path) and `flow.ts` (the in-memory scheduler) are
|
||||
Phase-1-specific. Copying the pure files avoids a workspace-package extraction
|
||||
(YAGNI — only two consumers) while keeping the Phase-1 CLI as a regression
|
||||
oracle. The evidence/yagni contracts are preserved because the flow-runner calls
|
||||
`step.run(ctx)` in-process to build each prompt BEFORE inserting the task — the
|
||||
closures execute in the coder process; prompts are never serialized to DB.
|
||||
|
||||
**Evidence.** `code-review.ts:10` (`import { dispatchAgent } from '../dispatch.js'`)
|
||||
and `:62` (the per-dimension dispatch call) — the only flow→dispatch coupling
|
||||
(C1). `spine.ts:122` renders `process.env.CONDUCTOR_MODEL` into the report header
|
||||
(C14). `spine.ts:73` — contracts injected via the step closure, in-process (C11).
|
||||
23 personas confirmed at `conductor/agents/*.md`.
|
||||
|
||||
**Rejected alternatives.**
|
||||
- A `@boocode/conductor` workspace package — rejected: only two consumers (Phase-1
|
||||
CLI + coder); a shared package is premature abstraction (YAGNI). Deferred with a
|
||||
reopen trigger (a 3rd app needing conductor types). See Deferred (YAGNI).
|
||||
- Importing `conductor/src/*` directly from `apps/coder` across the workspace
|
||||
boundary — rejected: couples the coder build to the standalone CLI tree and its
|
||||
`opencode`-flavored dispatch import graph.
|
||||
|
||||
**Specialist owner.** software-architect.
|
||||
**Revisit criterion.** A third app needs the conductor types (then extract the
|
||||
workspace package).
|
||||
**Driven by rounds:** R1.
|
||||
**Referenced in plan:** design.md §Re-home & DispatchFn seam; tasks.md group 1.
|
||||
|
||||
---
|
||||
|
||||
## D-2 — DB-driven flow-runner with an `onTaskTerminal` dispatcher hook
|
||||
|
||||
**Decision.** Add `apps/coder/src/services/flow-runner.ts`: a DB-backed scheduler
|
||||
that owns `flow_runs`/`flow_steps`, computes the ready wave from the loaded flow
|
||||
def, INSERTs each ready `agent` step as a `tasks` row, runs `code` steps inline,
|
||||
and advances. Fan-out is driven by ONE new hook — an `onTaskTerminal(taskId,
|
||||
state)` callback on `createDispatcher` — invoked when any task reaches a terminal
|
||||
state. No third poll loop; no modification to the dispatcher's internal run
|
||||
functions.
|
||||
|
||||
**Rationale.** The dispatcher already has the LISTEN/NOTIFY + poll machinery and
|
||||
the terminal-state transitions; a single callback at those transition points lets
|
||||
the flow-runner react without duplicating the dispatch loop. The flow-runner stays
|
||||
a pure scheduler; execution stays in the dispatcher.
|
||||
|
||||
**Evidence.** `dispatcher.ts:46-179` (the loop + `runTask`), `:279-286` (the
|
||||
`notify_tasks_new` trigger) (C2). Terminal transitions the hook attaches to:
|
||||
external completed `dispatcher.ts:642-646`, external failed `:659-661`. Full step
|
||||
output must persist in `flow_steps.output TEXT` because `tasks.output_summary` is
|
||||
≤500 char and cannot reconstruct `ctx.results` for render/resume (`schema.sql:26`,
|
||||
`flow.ts:49,59`) (C3).
|
||||
|
||||
**Rejected alternatives.**
|
||||
- A standalone third poll loop in the flow-runner — rejected: duplicates the
|
||||
dispatcher's LISTEN/poll, two writers racing on `tasks`.
|
||||
- Modifying the dispatcher's `runTask` internals to know about flows — rejected:
|
||||
couples the generic dispatcher to the orchestrator; the callback seam keeps the
|
||||
dispatcher flow-agnostic.
|
||||
|
||||
**Specialist owner.** software-architect.
|
||||
**Revisit criterion.** Step throughput requires batching beyond what one callback
|
||||
per terminal task supports.
|
||||
**Driven by rounds:** R1.
|
||||
**Referenced in plan:** design.md §Flow-runner & onTaskTerminal; tasks.md group 4.
|
||||
|
||||
---
|
||||
|
||||
## D-3 — Reuse the existing dispatcher (insert pending task), not a direct-PTY bypass
|
||||
|
||||
**Decision.** The flow-runner INSERTs each ready step as a normal `state='pending'`
|
||||
`tasks` row; the existing dispatcher picks it up via `LISTEN 'tasks_new'`, runs it
|
||||
through the existing external-agent path (creating a git worktree as a stable HEAD
|
||||
read-checkout), and streams AgentEvents → WS frames unchanged. The new
|
||||
`onTaskTerminal` hook (D-2) notifies the flow-runner on terminal state. No
|
||||
direct-PTY bypass; the dispatcher is reused with exactly one new hook.
|
||||
|
||||
This **REVISES design-context decision 5** ("no worktree") to: a worktree IS
|
||||
created, but it is a harmless read snapshot — read-only is enforced by plan mode
|
||||
(D-4), not by the absence of a worktree.
|
||||
|
||||
**Rationale.** Reuse (architect's A2) gets streaming, persistence, resume,
|
||||
cancellation, and AgentEvent→WS mapping for free. The only objection to A2 was
|
||||
that it creates a worktree the "no worktree" decision-5 wanted to avoid; once
|
||||
read-only is enforced at the tool level by plan mode (D-4), the worktree is inert
|
||||
(a checkout the agent cannot write to), so the objection dissolves. This was the
|
||||
user's explicit choice over the architect's leaning-toward-bypass (A4).
|
||||
|
||||
**Evidence.** The external-agent path with worktree creation and AgentEvent→WS
|
||||
streaming: `dispatcher.ts` external branch (worktree create → run → terminal at
|
||||
`:642-646`/`:659-661`). Task-as-dispatch precedents the flow-runner copies:
|
||||
`routes/skills.ts:94` (a skill is already dispatched as a task),
|
||||
`routes/arena.ts:49`, `tools/new_task.ts:54`. Dispatch tension recorded as C12
|
||||
(A2 vs A4, architect self-flagged Disputed); resolved here by user choice.
|
||||
|
||||
**Rejected alternatives.**
|
||||
- A4 direct-`dispatchViaPty` bypass (insert `running` task + call PTY directly to
|
||||
skip worktree creation) — rejected: duplicates streaming/persistence/resume
|
||||
wiring, and a restart kills the PTY child outside the dispatcher's lifecycle
|
||||
(worsening resume, C15). The worktree it was avoiding is harmless under D-4.
|
||||
- design-context decision 5's "no worktree, read project dir directly" — rejected:
|
||||
reusing the dispatcher means reusing its worktree creation; under D-4 the
|
||||
worktree is a read snapshot, so avoiding it bought nothing and cost the reuse.
|
||||
|
||||
**Specialist owner.** software-architect (execution path); devops-engineer
|
||||
(operational behavior of the reused dispatcher under flow load).
|
||||
**Revisit criterion.** Worktree creation per step becomes a measured throughput or
|
||||
disk-cost problem under real flow concurrency.
|
||||
**Driven by rounds:** R1 (C12), R2 (read-only finding that made the worktree inert).
|
||||
**Referenced in plan:** design.md §Execution via dispatcher reuse; tasks.md group 4.
|
||||
|
||||
---
|
||||
|
||||
## D-4 — Read-only enforced HARD by `mode_id='plan'` (qwen `--approval-mode plan`)
|
||||
|
||||
**Decision.** Every orchestrator step task is dispatched with `mode_id = 'plan'`,
|
||||
which the PTY dispatcher passes to qwen as `--approval-mode plan` — a built-in
|
||||
tool-level gate: reads allowed, writes blocked. The flow-runner hardcodes
|
||||
`mode_id='plan'` for every step task; it is never user-overridable. This is the
|
||||
sole read-only enforcement mechanism. `BOOCODE_TOOLS` and persona prompts are NOT
|
||||
relied upon (they do not govern external CLI agents).
|
||||
|
||||
**Rationale.** Read-only is a safety-critical invariant of the whole feature
|
||||
(flows never write the repo). Prompt-level intent and `BOOCODE_TOOLS` ceilings
|
||||
govern BooChat's in-process tools, not an external `qwen` CLI child — so they are
|
||||
not watertight. qwen's `--approval-mode plan` is a tool-level gate inside the
|
||||
agent binary itself, which the adversarial-security-analyst (R2) identified as the
|
||||
only enforcement that actually binds the external agent. Qwen-only (decision 6)
|
||||
makes a single hardcoded flag sufficient.
|
||||
|
||||
**Evidence.** The wiring already exists: `pty-dispatch.ts:75` —
|
||||
`if (modeId) args.push('--approval-mode', modeId)` in the `qwen` spawn spec. R2
|
||||
security finding recorded as C13 (the R1 claim that prompt-level + `BOOCODE_TOOLS`
|
||||
enforcement was sufficient was Anecdotal/unproven; R2 refuted it and named plan
|
||||
mode as the binding control).
|
||||
|
||||
**Rejected alternatives.**
|
||||
- Prompt-level read-only intent (personas tell the agent not to write) — rejected
|
||||
(C13, R2): an instruction, not a gate; a model can ignore or be steered past it.
|
||||
- `BOOCODE_TOOLS=core` as the gate — rejected (C13, R2): governs BooChat's
|
||||
in-process tool registry, does not constrain the external `qwen` CLI's own tools.
|
||||
- A `read_only` boolean flag on `tasks` — rejected: superseded by `mode_id='plan'`,
|
||||
which is an existing column already plumbed to the binary. See Deferred (YAGNI).
|
||||
|
||||
**Specialist owner.** adversarial-security-analyst.
|
||||
**Revisit criterion.** A non-qwen agent is added to flows (re-verify that agent's
|
||||
equivalent of `--approval-mode plan` before allowing it), or qwen changes
|
||||
`--approval-mode plan` semantics.
|
||||
**Driven by rounds:** R1 (C13 flagged), R2 (resolved).
|
||||
**Referenced in plan:** design.md §Read-only via plan mode; tasks.md group 4.
|
||||
|
||||
---
|
||||
|
||||
## D-5 — `flow_runs` + `flow_steps` schema in the coder schema
|
||||
|
||||
**Decision.** Add two tables to `apps/coder/src/schema.sql`:
|
||||
|
||||
- `flow_runs(id, project_id [no FK, matches tasks.project_id], flow_name, band
|
||||
[CHECK small|medium|large], model, status [CHECK-named], input JSONB
|
||||
[CHECK (input ? 'question')], report TEXT [nullable], error, timestamps)`.
|
||||
- `flow_steps(id, run_id [FK → flow_runs ON DELETE CASCADE], step_id, kind
|
||||
[CHECK agent|code], agent, status [CHECK-named], task_id [UUID → tasks(id) ON
|
||||
DELETE SET NULL; nullable, code steps NULL], chat_id [UUID → chats(id) ON DELETE
|
||||
SET NULL], input TEXT, output TEXT [FULL output], error, timestamps, UNIQUE(run_id,
|
||||
step_id))`.
|
||||
|
||||
No `depends_on` column (derive from the loaded flow def). Do NOT insert
|
||||
skipped-step rows (`when()` is pure on stored input). Indexes:
|
||||
`flow_steps(run_id, status)`, `flow_runs(project_id, created_at DESC)`. Explicit
|
||||
CHECK constraint names + the repo's DROP-IF-EXISTS → guarded-ADD migration
|
||||
discipline.
|
||||
|
||||
**Rationale.** A run spans multiple tasks; existing tables (`tasks`,
|
||||
`agent_sessions`) model single dispatches, not a DAG. `flow_steps.task_id →
|
||||
tasks(id)` (not a column on `tasks`) keeps `tasks` generic. `output TEXT` is FULL
|
||||
because `tasks.output_summary` is ≤500 char and cannot reconstruct `ctx.results`.
|
||||
`project_id` has no FK to match `tasks.project_id`'s existing convention.
|
||||
|
||||
**Evidence.** `tasks` shape and `output_summary` ≤500 char: `schema.sql:18-34`,
|
||||
`:26` (C3, C4). `flow.ts:49,59` (results reconstruction needs full output, C3).
|
||||
`flow.ts:28-41`, `types.ts:27` (deps + `when()` derivable from flow def — omit
|
||||
`depends_on` and skipped rows, C6). `schema.sql:19,32` (project_id no-FK pattern;
|
||||
CHECK-named discipline, C5). Migration discipline: root CLAUDE.md schema section.
|
||||
|
||||
**Rejected alternatives.**
|
||||
- A `depends_on` column on `flow_steps` — rejected (C6, YAGNI): deps are in the
|
||||
loaded flow def; storing them duplicates the source of truth. Deferred.
|
||||
- Persisting skipped-step rows — rejected (C6, YAGNI): `when()` is pure on stored
|
||||
`input`, so a skip is reconstructable. Deferred.
|
||||
- A column on `tasks` (e.g. `flow_step_id`) — rejected (C4): pollutes the generic
|
||||
tasks table; the FK belongs on `flow_steps`.
|
||||
|
||||
**Specialist owner.** data-engineer.
|
||||
**Revisit criterion.** A stored-run DAG visualization needs deps without loading
|
||||
the flow def (then add `depends_on`); the UI must explain a skip without the flow
|
||||
def (then persist skipped rows).
|
||||
**Driven by rounds:** R1.
|
||||
**Referenced in plan:** design.md §Schema; tasks.md group 2.
|
||||
|
||||
---
|
||||
|
||||
## D-6 — Two new WS frames; per-agent stream reuses existing frames by `chat_id`
|
||||
|
||||
**Decision.** Add two frames to `packages/contracts/src/ws-frames.ts`:
|
||||
|
||||
- `flow_run_started`: `run_id, flow_name, band, steps[]` (each `step_id, agent,
|
||||
kind, chat_id, label`).
|
||||
- `flow_run_step_updated`: `run_id, step_id, status, run_status?, report?`.
|
||||
|
||||
The per-agent content stream REUSES the existing `delta` / `tool_call` /
|
||||
`message_complete` frames keyed by the step's `chat_id`. Each agent step gets a
|
||||
synthetic `chats` row for stream attribution. Register in all THREE frame
|
||||
registries: contracts `WsFrameSchema`, the server `InferenceFrame` union
|
||||
(`services/inference/turn.ts`), and the web strict `WsFrame` union
|
||||
(`apps/web/src/api/types.ts`) — the web type is the wire-format gate.
|
||||
|
||||
**Rationale.** The run-level lifecycle (which agents exist, their status, the final
|
||||
report) needs new frames; the per-agent token stream is exactly what the existing
|
||||
delta/tool_call/message_complete pipeline already carries, so keying it by a
|
||||
synthetic `chat_id` reuses the whole broker→WS path with no new streaming code.
|
||||
The report rides on `flow_run_step_updated` rather than its own frame (one fewer
|
||||
frame type; revisit only if reports exceed the frame size limit).
|
||||
|
||||
**Evidence.** Existing broker→WS frame pipeline and frame list: `ws-frames.ts`
|
||||
(snapshot…error). Three-registry rule + web-type-is-wire-gate: root CLAUDE.md
|
||||
"Adding a new WS frame type" + discovery notes §packages/contracts. Stream-by-chat
|
||||
reuse precedent: the dispatcher publishes delta/tool_call/message_complete keyed
|
||||
by chat already (C7).
|
||||
|
||||
**Rejected alternatives.**
|
||||
- New per-agent stream frames (`flow_agent_delta`, etc.) — rejected: the existing
|
||||
delta/tool_call/message_complete already stream by chat; new frames duplicate
|
||||
them.
|
||||
- A separate `flow_run_report` frame — rejected (YAGNI): the report fits on
|
||||
`flow_run_step_updated`. Deferred with a reopen trigger (reports exceed ~50KB).
|
||||
|
||||
**Specialist owner.** software-architect.
|
||||
**Revisit criterion.** Reports exceed the frame size limit (~50KB) → split the
|
||||
report onto its own frame.
|
||||
**Driven by rounds:** R1.
|
||||
**Referenced in plan:** design.md §WS frames; tasks.md group 3.
|
||||
|
||||
---
|
||||
|
||||
## D-7 — `orchestrator` pane kind + OrchestratorPane
|
||||
|
||||
**Decision.** Add an `orchestrator` pane kind (following the
|
||||
`markdown_artifact`/`html_artifact` precedent) — touching `WorkspacePaneKind`,
|
||||
`useWorkspacePanes`, `Workspace`, `NewPaneMenu`, `ChatTabBar`,
|
||||
`PaneHeaderActions`. `OrchestratorPane.tsx`: run header; report-at-top on
|
||||
completion; collapsed agent roster reusing `AgentStatusDot`; expand-one-at-a-time
|
||||
detail well reusing CoderPane stream rendering; mobile single-column inline
|
||||
expand; auto-expand-follows-active. Runs history in `NewPaneMenu`. Export (copy /
|
||||
save-file / send-to-chat via the existing `sendToChat`) in the pane header `…`,
|
||||
conditional on a completed report.
|
||||
|
||||
**Rationale.** A fourth pane kind is already a precedented extension point; the
|
||||
pane reuses `AgentStatusDot` and the CoderPane stream renderer, so the new surface
|
||||
is composition, not new streaming UI. Expand-one-at-a-time avoids the crowding the
|
||||
grill rejected.
|
||||
|
||||
**Evidence.** Pane-kind precedent: `api/types.ts:386` `WorkspacePaneKind` (with
|
||||
`markdown_artifact`/`html_artifact`). Roster/status reuse: `AgentComposerBar.tsx:204`
|
||||
(`AgentStatusDot`), CoderPane stream rendering (C8). Launcher categories from the
|
||||
flow registry: `flows/index.ts`; runs history host `NewPaneMenu.tsx`; export via
|
||||
`lib/events.ts` `sendToChat` (C10).
|
||||
|
||||
**Rejected alternatives.**
|
||||
- Rendering runs inside the existing `coder` pane — rejected: a run is a
|
||||
parent-with-nested-children view, not a single agent session; conflating them
|
||||
crowds both.
|
||||
- All-agents-expanded simultaneously — rejected (C8): the crowding the design
|
||||
session explicitly rejected.
|
||||
|
||||
**Specialist owner.** user-experience-designer.
|
||||
**Revisit criterion.** Users cannot follow multiple concurrent runs from the
|
||||
roster (then revisit the expand model).
|
||||
**Driven by rounds:** R1.
|
||||
**Referenced in plan:** design.md §Orchestrator pane; tasks.md groups 7, 10.
|
||||
|
||||
---
|
||||
|
||||
## D-8 — Workflow toolbar button + slash launch, BooChat/BooCoder parity
|
||||
|
||||
**Decision.** Add a `Workflow` (lucide) button on `ChatInput`'s controls row,
|
||||
between the `SquareSlash` chip and the `Globe` pill — yielding parity in BooChat
|
||||
(ChatPane) and BooCoder (CoderPane) for free. Label "Flows" on desktop, icon-only
|
||||
on mobile (toolbar confirmed to fit one line). Slash launches instantly with
|
||||
defaults (band small, current pane's project, text-after-command = focus),
|
||||
opening the pane. The button opens `FlowLauncherDialog.tsx` first: 5 category tabs
|
||||
(Analysis/Discovery/Planning/Authoring/Review) → filtered flow list + size + focus
|
||||
+ fast toggle; defaults Analysis/Small/off.
|
||||
|
||||
**Rationale.** `ChatInput` is the shared composer rendered by both panes, so a
|
||||
single button gives both doors with parity at no extra cost. The toolbar fits one
|
||||
line at ≤5 elements, so adding the button does not force scroll/wrap (a standing
|
||||
mobile constraint).
|
||||
|
||||
**Evidence.** `ChatInput.tsx:648-732`, `:673` — the controls row is ≤5 elements;
|
||||
adding the `Workflow` icon between SquareSlash and Globe keeps it one line; refutes
|
||||
junior Q13's crowding worry (C9). Launcher categories from `flows/index.ts` (C10).
|
||||
Shared-composer fact: discovery notes §apps/web (ChatInput rendered by ChatPane +
|
||||
CoderPane).
|
||||
|
||||
**Rejected alternatives.**
|
||||
- Separate buttons in ChatPane and CoderPane — rejected: duplicates wiring; the
|
||||
shared composer already gives parity from one button.
|
||||
- A launcher search box instead of category tabs — rejected (YAGNI): 22 flows in 5
|
||||
categories are browsable; a search box is unproven need. Deferred.
|
||||
|
||||
**Specialist owner.** user-experience-designer.
|
||||
**Revisit criterion.** Category grouping fails users at the 22-flow catalog size
|
||||
(then add the search box).
|
||||
**Driven by rounds:** R1.
|
||||
**Referenced in plan:** design.md §Toolbar button & launcher; tasks.md groups 8, 9.
|
||||
|
||||
---
|
||||
|
||||
## D-9 — Resumable runs via `initResume` on coder startup
|
||||
|
||||
**Decision.** On coder startup, an `initResume` re-advances every `flow_runs WHERE
|
||||
status='running'`: a step whose task completed → mark the step done + advance the
|
||||
run; a step whose task is lost/failed (PTY died on restart) → re-dispatch;
|
||||
completed steps are kept. (design-context decision 4 commits to "resumable".)
|
||||
|
||||
**Rationale.** A restart can land mid-flight. Because execution goes through the
|
||||
dispatcher with persisted task state (D-3), a step's outcome is recoverable from
|
||||
the DB; the run-level scheduler just has to re-derive the wave and re-dispatch only
|
||||
the steps that did not finish. Reconcile-and-advance (architect A3) beats
|
||||
mark-run-failed (data's conservative option) because decision 4 already committed
|
||||
to resumable and the task state is durable.
|
||||
|
||||
**Evidence.** No run-level resume exists today (single tasks resume via
|
||||
`agent_sessions`; a run spanning tasks does not) — discovery notes §Enumerated
|
||||
gaps. Resume tension recorded as C15 (architect reconcile-and-advance vs data
|
||||
mark-failed); resolved toward reconcile-and-advance by decision 4 + durable task
|
||||
state under D-3.
|
||||
|
||||
**Rejected alternatives.**
|
||||
- Mark a running run failed on restart — rejected (C15): contradicts decision 4
|
||||
(resumable) and discards recoverable completed-step work.
|
||||
- Re-running the whole flow from step 0 — rejected: re-does completed steps,
|
||||
burning the local model on work already persisted.
|
||||
|
||||
**Specialist owner.** software-architect (scheduler); data-engineer (recovery query).
|
||||
**Revisit criterion.** A step-level idempotency hazard surfaces where re-dispatch
|
||||
of a "lost" step double-counts side effects (none expected under read-only plan
|
||||
mode).
|
||||
**Driven by rounds:** R1.
|
||||
**Referenced in plan:** design.md §Resume; tasks.md group 5.
|
||||
|
||||
---
|
||||
|
||||
## D-10 — Concurrency: multiple runs, no `queued` status, single model per run
|
||||
|
||||
**Decision.** Multiple runs are allowed; each gets its own pane + `flow_runs` row,
|
||||
no shared state. Step statuses are pending / running / completed / failed / skipped
|
||||
— there is NO separate `queued` status (the dispatcher's `pending` covers a step
|
||||
waiting on the busy model or on deps). Model is a single config value per run,
|
||||
default `qwen3.6-35b-a3b-mxfp4`.
|
||||
|
||||
**Rationale.** Each run is independent state, so concurrency needs no coordination
|
||||
beyond the dispatcher's existing per-session serialization. A `queued` status is
|
||||
not observable: with the model busy, a task is simply `pending`/`running` and
|
||||
llama-swap does not expose queue position, so a distinct `queued` state would be a
|
||||
label the system cannot honestly populate (revising decision-11's "panes show
|
||||
queued honestly").
|
||||
|
||||
**Evidence.** `queued` unobservability recorded as C16 (junior Q11, data
|
||||
DATA-005): llama-swap does not report queue position; the status reduces to
|
||||
pending(dep/model-wait)/running. Single-model-per-run carried from decision 6/11.
|
||||
|
||||
**Rejected alternatives.**
|
||||
- A distinct `queued` step status — rejected (C16): nothing can populate it
|
||||
honestly; `pending` already means "waiting". Deferred (reopen if llama-swap
|
||||
exposes queue position).
|
||||
- Serializing runs (one at a time) — rejected: runs are independent; serialization
|
||||
adds coordination for no benefit and hurts the multi-pane UX (decision 11).
|
||||
|
||||
**Specialist owner.** data-engineer (status set), devops-engineer (model-busy
|
||||
behavior under concurrent runs).
|
||||
**Revisit criterion.** llama-swap exposes queue position → add an observable
|
||||
`queued` status.
|
||||
**Driven by rounds:** R1.
|
||||
**Referenced in plan:** design.md §Schema (status sets) + §Concurrency; tasks.md
|
||||
group 2.
|
||||
|
||||
---
|
||||
|
||||
## Cross-reference index
|
||||
|
||||
| Decision | Driven by | Design.md section | Tasks.md group |
|
||||
|---|---|---|---|
|
||||
| D-1 Re-home + DispatchFn | R1 (C1, C11, C14) | Re-home & DispatchFn seam | 1 |
|
||||
| D-2 Flow-runner + onTaskTerminal | R1 (C2, C3) | Flow-runner & onTaskTerminal | 4 |
|
||||
| D-3 Dispatcher reuse (not bypass) | R1 (C12), R2 | Execution via dispatcher reuse | 4 |
|
||||
| D-4 Read-only via plan mode | R1 (C13), R2 | Read-only via plan mode | 4 |
|
||||
| D-5 Schema flow_runs/flow_steps | R1 (C3–C6) | Schema | 2 |
|
||||
| D-6 WS frames | R1 (C7) | WS frames | 3 |
|
||||
| D-7 Orchestrator pane | R1 (C8) | Orchestrator pane | 7, 10 |
|
||||
| D-8 Toolbar button + slash | R1 (C9, C10) | Toolbar button & launcher | 8, 9 |
|
||||
| D-9 Resume | R1 (C15) | Resume | 5 |
|
||||
| D-10 Concurrency / no-queued | R1 (C16) | Schema + Concurrency | 2 |
|
||||
|
||||
## Deferred (YAGNI)
|
||||
|
||||
These were considered and deferred under the evidence rule. Each names the trigger
|
||||
that would justify reopening.
|
||||
|
||||
### `@boocode/conductor` workspace package
|
||||
- **Why deferred:** only two consumers (Phase-1 CLI + coder); copy-in (D-1) avoids
|
||||
premature shared-package abstraction.
|
||||
- **Reopen when:** a third app needs the conductor types.
|
||||
- **Source:** architect (D-1 rejected alternative).
|
||||
|
||||
### `flow_steps.depends_on` column
|
||||
- **Why deferred:** deps are derivable from the loaded flow def (`flow.ts:28-41`,
|
||||
`types.ts:27`); a column duplicates the source of truth.
|
||||
- **Reopen when:** a stored-run DAG visualization must show deps without loading
|
||||
the flow def.
|
||||
- **Source:** data-engineer C6 (D-5 rejected alternative).
|
||||
|
||||
### Persisted skipped-step rows
|
||||
- **Why deferred:** `when()` is pure on stored `input`, so a skip is
|
||||
reconstructable from the flow def + run input.
|
||||
- **Reopen when:** the UI must explain a skip without the flow def.
|
||||
- **Source:** data-engineer C6 (D-5 rejected alternative).
|
||||
|
||||
### `read_only` flag on `tasks`
|
||||
- **Why deferred:** superseded by `mode_id='plan'` (D-4), an existing column
|
||||
already plumbed to qwen's `--approval-mode`.
|
||||
- **Reopen when:** a non-qwen agent without a `--approval-mode plan` equivalent is
|
||||
added to flows.
|
||||
- **Source:** D-4 rejected alternative.
|
||||
|
||||
### Explicit `queued` step status
|
||||
- **Why deferred:** llama-swap does not expose queue position; nothing can populate
|
||||
the status honestly (C16). `pending` covers waiting.
|
||||
- **Reopen when:** llama-swap exposes queue position.
|
||||
- **Source:** junior Q11 / data DATA-005 (D-10 rejected alternative).
|
||||
|
||||
### Launcher search box
|
||||
- **Why deferred:** 22 flows in 5 category tabs are browsable; a search box is
|
||||
unproven need.
|
||||
- **Reopen when:** category grouping fails users at the catalog size.
|
||||
- **Source:** UX C10 (D-8 rejected alternative).
|
||||
|
||||
### Separate report-stored WS frame
|
||||
- **Why deferred:** the report rides on `flow_run_step_updated` (D-6).
|
||||
- **Reopen when:** reports exceed the ~50KB frame size limit.
|
||||
- **Source:** architect C7 (D-6 rejected alternative).
|
||||
@@ -0,0 +1,89 @@
|
||||
# Implementation Iteration History — Orchestrator (Phase 2)
|
||||
|
||||
Source spec: [../proposal.md](../proposal.md) · design context:
|
||||
[design-context.md](design-context.md) · discovery: [.discovery-notes.md](.discovery-notes.md)
|
||||
· decision log: [implementation-decision-log.md](implementation-decision-log.md)
|
||||
|
||||
## R1 — Parallel specialist review
|
||||
|
||||
**Specialists engaged:** software-architect, data-engineer, user-experience-designer, junior-developer (all sonnet).
|
||||
|
||||
**New input:** the settled design (12 decisions) + discovery notes. Each produced concrete HOW recommendations.
|
||||
|
||||
### Claim ledger
|
||||
|
||||
| # | Claim | State | Spec-maturity | Supporting |
|
||||
|---|---|---|---|---|
|
||||
| C1 | Re-home pure flow defs (`spine/flows/contracts/types/render`) into `apps/coder/src/conductor/`; sever `code-review.ts`→`dispatch.ts` coupling by injecting a `DispatchFn` via `StepContext`. Keep Phase-1 CLI alive. | Evidenced (`code-review.ts:10,62`; `conductor/tsconfig.json:6`) | plan-level | architect |
|
||||
| C2 | DB-driven scheduler `apps/coder/src/services/flow-runner.ts` + `flow_runs`/`flow_steps`; fan-out via an `onTaskTerminal` callback on the existing dispatcher (no 3rd poll loop). | Evidenced (`dispatcher.ts:46-179,279-286`) | plan-level | architect |
|
||||
| C3 | Full step output must persist in `flow_steps.output TEXT` — `tasks.output_summary` is ≤500 char and can't reconstruct `ctx.results` for render/resume. | Evidenced (`schema.sql:26`, `flow.ts:49,59`) | plan-level | data-engineer, architect |
|
||||
| C4 | FK direction = `flow_steps.task_id → tasks(id) ON DELETE SET NULL` (nullable; code steps NULL). Do NOT add a column to `tasks`. | Evidenced (`schema.sql:18-34`) | plan-level | data-engineer, architect |
|
||||
| C5 | `flow_runs.project_id` no FK (matches `tasks.project_id`); CHECK-named status constraints; `CHECK (input ? 'question')`. | Evidenced (`schema.sql:19,32`) | plan-level | data-engineer |
|
||||
| C6 | Omit `depends_on` column (deps derivable from loaded flow def) and skipped-step rows (`when()` is pure on stored `input`). | Evidenced (`flow.ts:28-41`, `types.ts:27`) — **YAGNI** | plan-level | data-engineer |
|
||||
| C7 | Two new WS frames: `flow_run_started` (step manifest + per-step `chat_id`) + `flow_run_step_updated` (status + final report). Content stream REUSES existing `delta/tool_call/message_complete` by `chat_id`. Per-step synthetic chat row. | Evidenced (broker pipeline; `ws-frames.ts`) | plan-level | architect |
|
||||
| C8 | Orchestrator pane: collapsed roster, expand-one-at-a-time detail well, reuse `AgentStatusDot`; report at top on completion. Mobile single-column inline expand. | Evidenced (`AgentComposerBar.tsx:204`, `CoderPane`) | plan-level | UX |
|
||||
| C9 | Toolbar fits: actual `ChatInput` row ≤5 elements; add `Workflow` icon between SquareSlash and Globe; "Flows" label desktop, icon-only mobile. **Resolves junior Q13.** | Evidenced (`ChatInput.tsx:648-732,673`) | plan-level | UX (refutes junior Q13 worry) |
|
||||
| C10 | Launcher: 5 category tabs (Analysis/Discovery/Planning/Authoring/Review) + filtered flow list + size + focus + fast; defaults Analysis/Small/off. Runs history in NewPaneMenu; export in pane header `…`. | Evidenced (`flows/index.ts`, `NewPaneMenu.tsx`, `lib/events.ts`) | plan-level | UX |
|
||||
| C11 | Contracts (evidence/yagni) still injected by calling `step.run(ctx)` in-process in flow-runner before INSERT — closures execute in the coder process; prompts are NOT serialized to DB. **Resolves junior Q12.** | Evidenced (`spine.ts:73`) | plan-level | architect (confirms junior Q12) |
|
||||
| **C12** | **Dispatch-mechanism tension:** A2 says insert *pending* task → dispatcher picks it up via LISTEN (reuses streaming+worktree). A4 says insert *running* task + call `dispatchViaPty` DIRECTLY to avoid worktree creation (decision 5). The two contradict; architect resolved toward A4 (bypass). | **Disputed (internal to architect: A2 vs A4)** | plan-level | architect (self-flagged) |
|
||||
| **C13** | **Read-only enforcement is prompt-level + `BOOCODE_TOOLS=core` (if the binary honors it)** + project-dir-as-cwd (no worktree). Architect + junior both say adversarial-security-analyst must verify it's watertight before decision 5 is safe. | **Anecdotal (enforcement not proven)** | plan-level | architect (A4), junior (Q8/Q9) |
|
||||
| C14 | `spine.ts` renders `process.env.CONDUCTOR_MODEL` into the report header (`spine.ts:122`) — must be parameterized to the run's model on re-home. Personas (`conductor/agents/*.md`) copied into `apps/coder`. | Evidenced (`spine.ts:122`, `dispatch.ts:15`) | plan-level | junior Q5/Q6 |
|
||||
| C15 | Resume semantics underspecified: re-dispatch in-flight steps vs mark-run-failed. With A4 direct-PTY, a restart kills the PTY child → in-flight steps MUST re-dispatch. Decision 4 commits to "resumable." | Disputed (architect reconcile-and-advance vs data mark-failed) | plan-level | architect (A3), data (OQ1), junior (Q3) |
|
||||
| C16 | `queued` (decision 11) is hard to observe: with direct PTY the task is `running` and blocked on the busy model; llama-swap doesn't report queue position. May reduce to pending(dep-wait)/running. | Anecdotal | plan-level (spec-vs-reality) | junior Q11, data DATA-005 |
|
||||
|
||||
### Spec-maturity gate
|
||||
|
||||
No `T#` notes exist (gate reduces to spec-level threshold). spec-level findings = 0 by ≥3 specialists: junior's Q15 nominated three "open items" as spec-level, but the architect + data-engineer **resolved** them in-plan (re-home → copy into `apps/coder/src/conductor`; step→task → 1:1 `flow_steps.task_id`; resume → decision 4 already commits to "resumable", direction settled below). **Gate does NOT trip.** Proceed in-plan.
|
||||
|
||||
### Open Questions
|
||||
|
||||
- **OQ1 (C12):** dispatch mechanism — reuse the dispatcher with a no-worktree branch, vs bypass via direct `dispatchViaPty`. → user escalation (recommendation below).
|
||||
- **OQ2 (C13):** is read-only watertight? → **R2: adversarial-security-analyst.**
|
||||
- **OQ3 (C15):** resume = re-dispatch in-flight steps on restart (recommended; decision 4 = resumable). → user confirm.
|
||||
- **OQ4 (C16):** keep an explicit `queued` status or reduce to pending/running. → user confirm (minor).
|
||||
|
||||
### Next-step recommendation
|
||||
|
||||
`continue iterating` → **R2**: one targeted specialist (adversarial-security-analyst) on read-only enforcement + the no-worktree safety question (OQ2), since it gates the safety of decision 5. Then a single batched user escalation (OQ1, OQ3, OQ4) and synthesis.
|
||||
|
||||
_Decisions produced: D-1 (from C1, C11, C14), D-2 (C2, C3), D-5 (C3–C6), D-6 (C7), D-7 (C8), D-8 (C9, C10), D-9 (C15), D-10 (C16). Partially produced (resolved in R2 + user escalation): D-3 (C12), D-4 (C13)._
|
||||
_Changed in plan: C12's A4-leaning bypass was REVERSED — the user chose dispatcher reuse (D-3), and R2's read-only finding made the worktree A4 wanted to avoid harmless. C13's "prompt + BOOCODE_TOOLS" enforcement was REPLACED by `mode_id='plan'` (D-4). C16's `queued` status was DROPPED (D-10)._
|
||||
|
||||
## R2 — Targeted security review (read-only enforcement)
|
||||
|
||||
**Specialist engaged:** adversarial-security-analyst (opus). **Charter:** OQ2 only — is read-only watertight for flow steps, and is the no-worktree posture (decision 5) safe?
|
||||
|
||||
**New input:** R1's C12/C13 (the dispatch-mechanism tension and the unproven prompt-level enforcement claim), plus the qwen PTY dispatch path.
|
||||
|
||||
### Claim ledger
|
||||
|
||||
| # | Claim | State | Supporting |
|
||||
|---|---|---|---|
|
||||
| C13-R | The R1 enforcement story (persona prompts + `BOOCODE_TOOLS=core` + project-dir-as-cwd) is NOT watertight for an external `qwen` CLI child: persona text is instruction not a gate; `BOOCODE_TOOLS` governs BooChat's in-process tool registry, not the external binary's own tools. Read-only must be enforced at the agent's own tool layer. | **Evidenced** (`pty-dispatch.ts:72-77` — the qwen spawn spec; `BOOCODE_TOOLS` scope is BooChat-only per CLAUDE.md env section) | adversarial-security-analyst |
|
||||
| C13-FIX | qwen's `--approval-mode plan` IS the binding control: a built-in tool-level gate (reads allowed, writes blocked) inside the agent binary. Already wired — `mode_id` → `--approval-mode` at `pty-dispatch.ts:75`. Dispatch every step with `mode_id='plan'`, never user-overridable. Qwen-only (decision 6) makes one hardcoded flag sufficient. | **Evidenced** (`pty-dispatch.ts:75`) | adversarial-security-analyst |
|
||||
| C12-R | Because plan mode is a tool-level write-block, the worktree the dispatcher creates is INERT — the agent cannot write to it. The "no worktree" motivation behind A4 (decision 5) dissolves: keep the worktree as a harmless read snapshot and REUSE the dispatcher (A2) rather than bypass it (A4). Resolves the C12 tension toward A2. | **Evidenced** (worktree = HEAD checkout; plan mode blocks writes to it) | adversarial-security-analyst (settles C12) |
|
||||
|
||||
### Resolution
|
||||
|
||||
- **OQ2 → resolved.** Read-only is watertight via `mode_id='plan'` (qwen
|
||||
`--approval-mode plan`), NOT prompt/`BOOCODE_TOOLS`. C13 moves Anecdotal →
|
||||
Evidenced (refuted-and-replaced).
|
||||
- **OQ1 (C12) → unblocked.** The security finding removes A4's only advantage; the
|
||||
user chose A2 (dispatcher reuse). Decision-context decision 5 ("no worktree") is
|
||||
REVISED to "worktree as a harmless read snapshot."
|
||||
- A new non-qwen agent in flows would require re-verifying its plan-mode equivalent
|
||||
before allowing it (recorded as the D-4 revisit criterion).
|
||||
|
||||
### User escalation (batched, post-R2)
|
||||
|
||||
- OQ1 → **reuse the dispatcher** (A2), one new `onTaskTerminal` hook, no PTY bypass.
|
||||
- OQ3 → **reconcile-and-advance** resume (re-dispatch lost/failed steps; keep
|
||||
completed).
|
||||
- OQ4 → **drop `queued`**; `pending` covers waiting.
|
||||
|
||||
### Next-step recommendation
|
||||
|
||||
`synthesize` — all blocking open questions resolved; no spec-maturity gate trip.
|
||||
|
||||
_Decisions produced: D-3 (from C12-R / OQ1 user choice), D-4 (from C13-R / C13-FIX). Co-produced with R1: confirms D-9 (OQ3) and D-10 (OQ4)._
|
||||
_Changed in plan: decision-context decision 5 REVISED (no-worktree → read-snapshot worktree, read-only via plan mode); C13's enforcement mechanism REPLACED (prompt/BOOCODE_TOOLS → `mode_id='plan'`); C12 RESOLVED toward A2 (reuse) over A4 (bypass)._
|
||||
Reference in New Issue
Block a user