2.6 KiB
2.6 KiB
Memory v2 — Design
Architecture
┌─────────────────────────┐
│ system-prompt.ts │
│ (inject memory block) │
└────────┬────────────────┘
│
┌─────────▼──────────┐
│ memory/recall.ts │
│ (renamed to query) │
└─────────┬──────────┘
│
┌───────────────┼───────────────┐
│ │ │
┌────────▼──────┐ ┌─────▼──────┐ ┌──────▼───────┐
│ BM25Ranker │ │ EmbedCache │ │ CosineRanker │
│ (stateless) │ │ (LRU map) │ │ (ONNX) │
└───────────────┘ └────────────┘ └──────────────┘
Module Changes
apps/server/src/services/memory/ — new/changed files
| File | Change |
|---|---|
recall.ts |
Replace rankByRelevance with hybrid rankByHybrid(query, entries) |
embeddings.ts |
New — ONNX model loader + embed(texts: string[]): number[][] |
bm25.ts |
New — BM25 scorer with score(query, doc): number |
ranker.ts |
New — weighted merge of BM25 + cosine scores |
entries.ts |
Add serializeForEmbedding(entry): string helper |
Embedding Model
- Model:
all-MiniLM-L6-v2(384-dim, ~23MB ONNX) - Runtime:
onnxruntime-nodenpm package or subprocess vianode:child_process - Cache:
Map<string, { embedding: number[], mtime: number }>in-memory, cleared on process restart - Fallback: BM25-only when model file is missing
Agent Tools (new)
| Tool | Description |
|---|---|
extract_memory(topic, title, content, tags?) |
Persists a memory entry. Topic must be one of project/user/reference |
search_memory(query) |
Returns up to 10 ranked memory entries matching the query. Replaces blind injection |
Scoring Formula
score = (BM25_score * 0.3) + (cosine_similarity * 0.7)
Both normalized to [0,1] before merging. Entries below threshold (0.15) are excluded.
Rollback
Set MEMORY_SEARCH=keyword env var to fall back to the v1 keyword-only path. Default is hybrid.