docs: add openspec proposals for memory v2 and orchestrator flow patterns

This commit is contained in:
2026-06-07 21:34:35 +00:00
parent fb52eb3efa
commit 028c08b4cd
6 changed files with 281 additions and 0 deletions

View File

@@ -0,0 +1,60 @@
# Memory v2 — Design
## Architecture
```
┌─────────────────────────┐
│ system-prompt.ts │
│ (inject memory block) │
└────────┬────────────────┘
┌─────────▼──────────┐
│ memory/recall.ts │
│ (renamed to query) │
└─────────┬──────────┘
┌───────────────┼───────────────┐
│ │ │
┌────────▼──────┐ ┌─────▼──────┐ ┌──────▼───────┐
│ BM25Ranker │ │ EmbedCache │ │ CosineRanker │
│ (stateless) │ │ (LRU map) │ │ (ONNX) │
└───────────────┘ └────────────┘ └──────────────┘
```
## Module Changes
### `apps/server/src/services/memory/` — new/changed files
| File | Change |
|------|--------|
| `recall.ts` | Replace `rankByRelevance` with hybrid `rankByHybrid(query, entries)` |
| `embeddings.ts` | **New** — ONNX model loader + `embed(texts: string[]): number[][]` |
| `bm25.ts` | **New** — BM25 scorer with `score(query, doc): number` |
| `ranker.ts` | **New** — weighted merge of BM25 + cosine scores |
| `entries.ts` | Add `serializeForEmbedding(entry): string` helper |
### Embedding Model
- Model: `all-MiniLM-L6-v2` (384-dim, ~23MB ONNX)
- Runtime: `onnxruntime-node` npm package or subprocess via `node:child_process`
- Cache: `Map<string, { embedding: number[], mtime: number }>` in-memory, cleared on process restart
- Fallback: BM25-only when model file is missing
### Agent Tools (new)
| Tool | Description |
|------|-------------|
| `extract_memory(topic, title, content, tags?)` | Persists a memory entry. Topic must be one of project/user/reference |
| `search_memory(query)` | Returns up to 10 ranked memory entries matching the query. Replaces blind injection |
### Scoring Formula
```
score = (BM25_score * 0.3) + (cosine_similarity * 0.7)
```
Both normalized to [0,1] before merging. Entries below threshold (0.15) are excluded.
## Rollback
Set `MEMORY_SEARCH=keyword` env var to fall back to the v1 keyword-only path. Default is `hybrid`.

View File

@@ -0,0 +1,39 @@
# Memory v2 — Hybrid Search & Auto-Extract
**Status:** Proposed
**Epic:** memory-v2-hybrid-search
**Depends on:** v2.8.0-fork-lifts (v1 memory already shipped)
## Why
v1 memory (shipped in v2.8.0-fork-lifts) provides file-based recall with keyword/tag matching injected into `system-prompt.ts`. It works but has three gaps:
1. **Keyword-only recall misses semantic matches** — "indentation" won't match a memory entry titled "Code style: tabs vs spaces" unless the word "indentation" appears verbatim.
2. **No auto-extraction** — memory files must be created manually. The LLM can't persist useful facts it discovers during conversation.
3. **Flat search, no ranking** — all keyword matches are equally weighted. No relevance scoring or deduplication.
v2 upgrades the retrieval layer while keeping the file-based storage format. No breaking changes to `.boocode/memory/` structure.
## What Changes
### Hybrid Search (high confidence)
Replace keyword-only `rankByRelevance` with BM25 + embedding hybrid search. Use a tiny local embedding model (all-MiniLM-L6-v2 through ONNX runtime or a local subprocess) so there's no external API dependency.
- **BM25** (already implementable without deps — term frequency + inverse document frequency scoring on the memory entries)
- **Embedding** (local ONNX model, ~20MB, runs inference in ~5ms on CPU, produces 384-dim vectors)
- **Weighted merge** (`score = 0.3 * bm25 + 0.7 * cosine`) — configurable ratio
### Auto-Extract Agent Tool (medium confidence)
A new `extract_memory` tool exposed to agents (not automatic — agent decides when to persist):
- `extract_memory(topic, title, content, tags)` → writes a markdown entry
- `search_memory(query)` → returns ranked memory entries (new tool, replaces raw injection)
### In-Memory Embedding Cache (optional)
Keep embeddings in an LRU map keyed by file mtime. Recompute only when files change. No DB migration needed.
## Non-Goals
- No vector database (SQLite FTS5 or in-memory BM25 suffice)
- No automatic background extraction agent (agent must explicitly call `extract_memory`)
- No changes to the `.boocode/memory/` file format
- No Python dependencies — ONNX runtime is a Node.js native addon or subprocess

View File

@@ -0,0 +1,30 @@
# Tasks — Memory v2
## Prerequisites
- v2.8.0 on main (v1 memory module shipped)
## Tasks
### 1. BM25 ranker
- [ ] 1.1 Write `bm25.ts` — pure function, no deps. BM25Okapi formula: `sum over terms of IDF * (tf * (k1 + 1)) / (tf + k1 * (1 - b + b * docLen / avgDocLen))`
- [ ] 1.2 Unit tests with known corpus
### 2. Embedding module
- [ ] 2.1 Write `embeddings.ts` — load ONNX model, `embed(texts: string[]): number[][]`
- [ ] 2.2 Write `ranker.ts` — cosine similarity + BM25 weighted merge
- [ ] 2.3 Fallback to BM25-only when model unavailable
### 3. Hybrid recall
- [ ] 3.1 Refactor `recall.ts``rankByRelevance``rankByHybrid` using BM25 + embedding when available
- [ ] 3.2 Keep keyword-only path as `MEMORY_SEARCH=keyword` env fallback
- [ ] 3.3 Server tests pass
### 4. Agent tools
- [ ] 4.1 Create `extract_memory` tool — persists entry, returns path
- [ ] 4.2 Create `search_memory` tool — replaces raw injection when used
- [ ] 4.3 Tool tests pass
### 5. Smoke
- [ ] 5.1 Create `.boocode/memory/project/style.md` with "Use two-space indentation"
- [ ] 5.2 `search_memory("what spacing convention")` returns the entry
- [ ] 5.3 `extract_memory("project", "Naming", "PascalCase for components")` creates the file

View File

@@ -0,0 +1,70 @@
# Orchestrator Advanced Flows — Design
## Architecture
```
┌───────────── Step dispatch ─────────────────┐
│ │
│ Flow-runner resolves step: │
│ 1. Check trigger_rule on deps │
│ 2. Substitute $vars in prompt │
│ 3. If approval gate → pause for user │
│ 4. INSERT task row → dispatcher picks up │
│ 5. On terminal: append to event log │
│ 6. Advance next ready step │
│ │
└──────────────────────────────────────────────┘
```
## Type Changes
### `apps/coder/src/conductor/types.ts`
```typescript
export type TriggerRule = 'all_success' | 'one_success' | 'all_done';
export interface Step {
id: string;
kind: StepKind | 'approval'; // + new kind
deps?: string[];
trigger_rule?: TriggerRule; // NEW: default 'all_success'
agent?: string;
run: (ctx: StepContext) => string | Promise<string>;
when?: (ctx: StepContext) => boolean;
}
```
### `apps/coder/src/services/flow-runner.ts`
| Change | Detail |
|--------|--------|
| Trigger evaluation | Before dispatching a step, check deps statuses against `trigger_rule`. Skip if conditions not met |
| Variable substitution | Scan prompt for `$word.word` patterns, resolve from previous step outputs |
| Approval gate | When `step.kind === 'approval'`, insert a `tasks` row with `state='blocked'` and publish a `permission_requested` WS frame. Wait for `permission_resolved` to unblock |
| Event log | Append-only per-step events: `{ step_id, event: 'started'|'completed'|'failed'|'paused'|'resumed', at: timestamp }` in `flow_step_events` table |
## Schema
```sql
CREATE TABLE IF NOT EXISTS flow_step_events (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
run_id UUID NOT NULL REFERENCES flow_runs(id),
step_id VARCHAR(64) NOT NULL,
event VARCHAR(32) NOT NULL, -- started, completed, failed, paused, resumed, skipped
payload JSONB,
created_at TIMESTAMPTZ NOT NULL DEFAULT clock_timestamp()
);
```
## Resolution Order
1. Collect all completed steps in the run
2. For each unstarted step whose deps are met:
- Evaluate `trigger_rule` against dep statuses
- If met → advance the step (with variable substitution)
- If not met → skip (for `one_success`, mark it complete when any dep succeeds)
3. For approval gates: pause, publish frame, wait for user response
## Rollback
All changes are additive to the Step type. Existing flows without `trigger_rule` default to `all_success`, preserving current behavior. Approval gates are opt-in per step definition.

View File

@@ -0,0 +1,42 @@
# Orchestrator — Advanced Flow Patterns
**Status:** Proposed
**Epic:** orchestrator-flow-advanced
**Depends on:** v2.7.17-orchestrator (flow-runner already shipped)
## Why
The orchestrator (shipped v2.7.17) runs sequential research/analysis flows on local Qwen. Each step is a linear dependency chain: A → B → C. This works for analysis flows (code-review, investigate) but limits three high-value scenarios:
1. **Parallel research** — "Analyze this from 3 angles" currently requires 3 separate runs. A single flow with parallel branches would halve the wall-clock time.
2. **Adaptive depth** — "Investigate bug, and if it's a security issue, escalate to security review" requires conditional branching. Currently all steps run unconditionally.
3. **Human-in-the-loop** — "Review the diff and approve before applying" requires the orchestrator to pause and wait for user input before proceeding.
The patterns from the Ion hybrid workflow engine (trigger rules, event sourcing, approval gates, variable substitution) provide a proven vocabulary for these scenarios — we adapt the patterns without adopting the project.
## What Changes
### Trigger rules on step deps
Add `trigger_rule?: 'all_success' | 'one_success' | 'all_done'` to the `Step` type. Default `all_success` preserves existing behavior.
- `all_success` — step runs when ALL dependencies complete successfully (current behavior)
- `one_success` — step runs when ANY dependency completes (parallel research: whichever finishes first seeds the synthesis)
- `all_done` — step runs when all deps finish regardless of status (cleanup/reporting steps)
### Variable substitution in step prompts
Add `$stepId.output` and `$stepId.output.field` syntax in step prompts. The flow-runner resolves these before dispatching.
- `$research.output` — the full text output of step with id "research"
- `$classify.output.severity` — the "severity" field from step output parsed as YAML/JSON frontmatter
### Human approval gate
New `kind: 'approval'` step type that pauses the flow and publishes a permission frame to the user channel. Flow resumes when the user approves or rejects.
### Event-sourced step log
Append-only event log for each step execution (start, complete, fail, skip, pause, resume). Enables deterministic resume after coder restart without polling.
## Non-Goals
- No YAML DAG format (stay with TypeScript flow definitions)
- No CLI tool (orchestrator stays in-app)
- No replacement of the existing flow definitions — additive changes only
- No VM sandbox or WASM

View File

@@ -0,0 +1,40 @@
# Tasks — Orchestrator Advanced Flows
## Prerequisites
- v2.7.17 on main (orchestrator + flow-runner shipped)
- v2.8.0 on main (fork-lifts complete)
## Tasks
### 1. Trigger rules in Step type
- [ ] 1.1 Add `TriggerRule` type to `conductor/types.ts`
- [ ] 1.2 Add `trigger_rule?: TriggerRule` field to `Step` interface (defaults `all_success`)
- [ ] 1.3 Write `evaluateTriggerRule(deps, rule): boolean` in `flow-runner-decisions.ts`
- [ ] 1.4 Unit tests for each rule variant
### 2. Variable substitution
- [ ] 2.1 Write `resolveVariables(prompt, completedSteps): string` in flow-runner
- [ ] 2.2 Supports `$stepId.output` and `$stepId.output.field` (dot-path)
- [ ] 2.3 Unit tests with multi-step outputs
### 3. Approval gate step kind
- [ ] 3.1 Add `'approval'` to `StepKind` union
- [ ] 3.2 Flow-runner: when step.kind === 'approval', pause and publish `permission_requested` frame
- [ ] 3.3 Wire `permission_resolved` frame handler to unblock blocked step
- [ ] 3.4 Test: approval gate pauses flow, approval resumes it
### 4. Event-sourced step log
- [ ] 4.1 Create `flow_step_events` table in `apps/coder/src/schema.sql`
- [ ] 4.2 Write `appendStepEvent(runId, stepId, event, payload?)` helper
- [ ] 4.3 Wire events into flow-runner lifecycle hooks (start, complete, fail, skip, pause, resume)
- [ ] 4.4 Unit test: events are recorded in order
### 5. Example flow with parallel branches
- [ ] 5.1 Create `conductor/flows/parallel-research.ts` — splits into 3 parallel research steps, then joins with synthesis
- [ ] 5.2 Uses `trigger_rule: 'one_success'` on the synthesis step
- [ ] 5.3 Integration test: parallel flow completes correctly
### 6. Smoke
- [ ] 6.1 Run parallel-research flow with 3 agents
- [ ] 6.2 Verify synthesis step triggers on first completion
- [ ] 6.3 Verify variable substitution in synthesis prompt