Memory v2 — Hybrid Search & Auto-Extract

Status: Proposed Epic: memory-v2-hybrid-search Depends on: v2.8.0-fork-lifts (v1 memory already shipped)

Why

v1 memory (shipped in v2.8.0-fork-lifts) provides file-based recall with keyword/tag matching injected into system-prompt.ts. It works but has three gaps:

Keyword-only recall misses semantic matches — "indentation" won't match a memory entry titled "Code style: tabs vs spaces" unless the word "indentation" appears verbatim.
No auto-extraction — memory files must be created manually. The LLM can't persist useful facts it discovers during conversation.
Flat search, no ranking — all keyword matches are equally weighted. No relevance scoring or deduplication.

v2 upgrades the retrieval layer while keeping the file-based storage format. No breaking changes to .boocode/memory/ structure.

What Changes

Hybrid Search (high confidence)

Replace keyword-only rankByRelevance with BM25 + embedding hybrid search. Use a tiny local embedding model (all-MiniLM-L6-v2 through ONNX runtime or a local subprocess) so there's no external API dependency.

BM25 (already implementable without deps — term frequency + inverse document frequency scoring on the memory entries)
Embedding (local ONNX model, ~20MB, runs inference in ~5ms on CPU, produces 384-dim vectors)
Weighted merge (score = 0.3 * bm25 + 0.7 * cosine) — configurable ratio

Auto-Extract Agent Tool (medium confidence)

A new extract_memory tool exposed to agents (not automatic — agent decides when to persist):

extract_memory(topic, title, content, tags) → writes a markdown entry
search_memory(query) → returns ranked memory entries (new tool, replaces raw injection)

In-Memory Embedding Cache (optional)

Keep embeddings in an LRU map keyed by file mtime. Recompute only when files change. No DB migration needed.

Non-Goals

No vector database (SQLite FTS5 or in-memory BM25 suffice)
No automatic background extraction agent (agent must explicitly call extract_memory)
No changes to the .boocode/memory/ file format
No Python dependencies — ONNX runtime is a Node.js native addon or subprocess

2.2 KiB Raw Blame History