# Memory v2 — Design ## Architecture ``` ┌─────────────────────────┐ │ system-prompt.ts │ │ (inject memory block) │ └────────┬────────────────┘ │ ┌─────────▼──────────┐ │ memory/recall.ts │ │ (renamed to query) │ └─────────┬──────────┘ │ ┌───────────────┼───────────────┐ │ │ │ ┌────────▼──────┐ ┌─────▼──────┐ ┌──────▼───────┐ │ BM25Ranker │ │ EmbedCache │ │ CosineRanker │ │ (stateless) │ │ (LRU map) │ │ (ONNX) │ └───────────────┘ └────────────┘ └──────────────┘ ``` ## Module Changes ### `apps/server/src/services/memory/` — new/changed files | File | Change | |------|--------| | `recall.ts` | Replace `rankByRelevance` with hybrid `rankByHybrid(query, entries)` | | `embeddings.ts` | **New** — ONNX model loader + `embed(texts: string[]): number[][]` | | `bm25.ts` | **New** — BM25 scorer with `score(query, doc): number` | | `ranker.ts` | **New** — weighted merge of BM25 + cosine scores | | `entries.ts` | Add `serializeForEmbedding(entry): string` helper | ### Embedding Model - Model: `all-MiniLM-L6-v2` (384-dim, ~23MB ONNX) - Runtime: `onnxruntime-node` npm package or subprocess via `node:child_process` - Cache: `Map` in-memory, cleared on process restart - Fallback: BM25-only when model file is missing ### Agent Tools (new) | Tool | Description | |------|-------------| | `extract_memory(topic, title, content, tags?)` | Persists a memory entry. Topic must be one of project/user/reference | | `search_memory(query)` | Returns up to 10 ranked memory entries matching the query. Replaces blind injection | ### Scoring Formula ``` score = (BM25_score * 0.3) + (cosine_similarity * 0.7) ``` Both normalized to [0,1] before merging. Entries below threshold (0.15) are excluded. ## Rollback Set `MEMORY_SEARCH=keyword` env var to fall back to the v1 keyword-only path. Default is `hybrid`.