Files
boocode/openspec/changes/archived/2026-06-07-memory-context-engineering/design.md
indifferentketchup c935687725 chore(openspec): drop 9 superseded proposals + 11 stub archive files
Drop 9 batch proposals that are superseded by the boocode-lift-analysis
(boocontext-audit, conductor upgrades, self-healing/verify-gate skills):
add-3tier-memory, import-llm-evaluator, import-pregel-engine, plugin-platform,
conductor-evolution, code-intelligence-upgrade, dev-workflow, ui-overhaul,
agent-reliability.

Delete 11 stub archive files (49-66B each, 'Status: Shipped. Archived.' only)
that provide zero documentation value over the existing CHANGELOG.md + git tags.
2026-06-07 22:15:38 +00:00

165 lines
10 KiB
Markdown

## Context
Current agents have no durable memory beyond the immediate LLM context window. Research across three production-grade OSS repos (LangMem, DeerFlow, CowAgent) reveals a consistent architectural pattern: a **tiered memory pipeline** with short-term context management, long-term semantic extraction, and periodic background consolidation. This design synthesizes those patterns into a portable, framework-agnostic `memory-engine` module.
The engine must be:
- **Portable** — works with any LLM, any agent framework, any embedding provider
- **Tiered** — separates ephemeral session context from persistent long-term knowledge
- **Efficient** — background processing, debounced writes, token-budget-aware formatting
- **Searchable** — hybrid keyword + vector retrieval with scoring
## Goals / Non-Goals
**Goals:**
- Provide a unified public API: `MemoryEngine` class with `manage()`, `search()`, `flush()`, `dream()` methods
- Short-term context: token-budget windowing + incremental summarization (LangMem's `summarize_messages` pattern)
- Long-term memory: LLM-extracted facts stored in SQLite with typed schemas (LangMem's `MemoryManager` + DeerFlow's fact model)
- Tiered consolidation: context→daily→core pipeline with configurable promotion rules (CowAgent's 3-tier)
- Hybrid search: FTS5 keyword + numpy-vectorized cosine similarity with weighted merge (CowAgent's `MemoryStorage`)
- Background processing: debounced async queue for memory updates (DeerFlow's `MemoryUpdateQueue` + LangMem's `ReflectionExecutor`)
- Agent tools: `manage_memory(content, action, id)` and `search_memory(query, limit)` as framework-agnostic callables
**Non-Goals:**
- Not a standalone agent framework — integrates into existing loops
- No built-in LLM provider — caller provides model
- No built-in embedding provider — caller provides or we degrade to keyword-only
- No real-time sync / distributed consensus — single-process design
- No graph-based memory (entity-relationship knowledge graphs) — deferred to future
## Decisions
### D1: SQLite as the single persistence backend
- **Choice**: SQLite with WAL mode for both keyword search (FTS5) and vector storage (BLOB embeddings)
- **Rationale**: Zero-dependency, production-proven, FTS5 is stdlib-compatible, numpy integration in-process
- **Alternatives considered**:
- *JSON files* (DeerFlow) → simpler but no built-in search, concurrency issues
- *External vector DB* (Pinecone, pgvector) → adds operational complexity, violates portability goal
- *LMDB/RocksDB* → overkill, no FTS5 equivalent
### D2: Three-tier architecture with file-based daily layer
- **Choice**: In-memory context tier → Markdown-file daily tier → SQLite-indexed core tier
- **Rationale**: Daily Markdown files are human-readable, easily audited, and serve as the input to Deep Dream consolidation. Core tier is the indexed, searchable fact store.
- **Alternatives considered**:
- *Single SQLite DB for everything* → loses human-readability of daily records
- *All in-memory* → no persistence across restarts
### D3: Fact extraction via structured LLM output (tool-calling pattern)
- **Choice**: LLM returns structured JSON (DeerFlow pattern) rather than tool-calling-based extraction (LangMem trustcall pattern)
- **Rationale**: Simpler, fewer dependencies, compatible with any LLM provider. LangMem's trustcall approach is more robust for complex multi-step edits but requires the `trustcall` library.
- **Fallback**: Confidence-thresholded insertion with content-dedup hashing to prevent duplicates
### D4: Hybrid search with numpy-vectorized cosine similarity
- **Choice**: Load relevant embeddings from SQLite, compute cosine similarity via `matrix @ vector` (numpy), merge with FTS5 BM25 scores
- **Rationale**: ~100x faster than per-row Python loops. Uses numpy which is near-ubiquitous in Python ML.
- **Fallback**: Pure-Python cosine similarity when numpy unavailable
### D5: Debounced background memory update queue
- **Choice**: Thread-safe priority queue with configurable debounce timer (DeerFlow pattern)
- **Rationale**: Prevents thundering-herd on LLM API during rapid conversation turns. Threaded execution avoids blocking the main agent loop.
- **Alternatives considered**: asyncio queue → fine for async-only, but MemoryEngine must support sync callers
### D6: Namespace isolation via tuple-based scoping
- **Choice**: `(scope_type, user_id, agent_id)` tuple namespace for multi-tenant isolation
- **Rationale**: LangMem's `NamespaceTemplate` pattern proven in production. Allows `("user", "u-123")` or `("org", "acme", "agent-alpha")`.
## Architecture Overview
```
┌─────────────────────────────────────────────────────────┐
│ MemoryEngine │
├─────────────────────────────────────────────────────────┤
│ manage_memory(content, scope, metadata) → fact_id │
│ search_memory(query, limit, scope) → SearchResults[] │
│ flush_messages(messages, scope) → boolean │
│ deep_dream(lookback_days, scope) → boolean │
│ format_for_injection(scope, max_tokens) → str │
└──────────────────────┬──────────────────────────────────┘
┌──────────────┼──────────────┐
▼ ▼ ▼
┌──────────────┐ ┌──────────┐ ┌──────────────┐
│ Context Tier │ │ Daily │ │ Core Tier │
│ (in-memory) │ │ Tier │ │ (SQLite + │
│ │ │ (Markdown│ │ FTS5 + │
│ RunningSumm. │ │ files) │ │ vectors) │
│ token budget │ │ │ │ │
└──────────────┘ │ Deep │ │ MemoryStore │
│ Dream ───┼─┤ (facts) │
└──────────┘ │ HybridSearch │
└──────────────┘
┌────────┴────────┐
▼ ▼
┌────────────┐ ┌────────────────┐
│ Keyword │ │ Vector Search │
│ (FTS5) │ │ (numpy cosine) │
└────────────┘ └────────────────┘
```
### Data Flow
1. **Agent sends message** → Context tier tracks token budget, optionally summarizes
2. **Conversation turn completes** → Messages queued to background `MemoryUpdateQueue`
3. **Debounce timer fires**`MemoryUpdater` calls LLM with current memory + conversation → extracts facts
4. **Facts persisted** → Core tier SQLite: chunks table with embedding, FTS5 index
5. **Daily recording**`MemoryFlushManager` appends to `memory/YYYY-MM-DD.md`
6. **Deep Dream (scheduled)** → LLM reads MEMORY.md + recent daily files → rewrites MEMORY.md → writes dream diary
7. **Agent starts new session**`format_for_injection()` reads core tier → builds token-budgeted context string → injects into system prompt
## Module Structure
```
memory-engine/
├── __init__.py # Public API: MemoryEngine, MemoryConfig
├── config.py # Pydantic config model
├── core/
│ ├── __init__.py
│ ├── store.py # MemoryStore (SQLite + FTS5 + vectors)
│ ├── hybrid_search.py # Vector + keyword merge with temporal decay
│ └── schemas.py # Memory, Fact, SearchResult models
├── extraction/
│ ├── __init__.py
│ ├── manager.py # MemoryManager (LLM fact extraction)
│ └── prompts.py # System prompts for memory extraction
├── tiers/
│ ├── __init__.py
│ ├── context.py # ContextTier (short-term summarization)
│ ├── daily.py # DailyTier (Markdown file management)
│ └── core.py # CoreTier (long-term persistent store)
├── background/
│ ├── __init__.py
│ ├── queue.py # MemoryUpdateQueue (debounced)
│ └── deep_dream.py # Deep Dream consolidation
├── tools/
│ ├── __init__.py
│ ├── manage.py # manage_memory callable
│ └── search.py # search_memory callable
├── embedding/
│ ├── __init__.py
│ ├── base.py # EmbeddingProvider ABC
│ └── openai.py # OpenAI embedding implementation
└── utils/
├── __init__.py
├── namespace.py # NamespaceTemplate
├── token_counter.py # Token counting (tiktoken wrapper)
└── chunker.py # Text chunking
```
## Risks / Trade-offs
| Risk | Mitigation |
|------|-----------|
| [R1] LLM extraction latency blocks agent loop | Background queue with debounce — agent never waits for memory update |
| [R2] Embedding API failures degrade search | Graceful degradation to keyword-only; vector results omitted, not fatal |
| [R3] SQLite write contention under high concurrency | WAL mode + RLock per connection; single-process assumption |
| [R4] FTS5 corrupted after crash | Self-healing on init: detect corrupt shadow tables, rebuild from chunks table |
| [R5] Memory bloat from unbounded fact accumulation | Configurable `max_facts` limit (default 500); sorted by confidence, oldest trimmed |
| [R6] Deep Dream overwrites valuable long-term data | Dream diary preserves audit trail; content-hash dedup prevents re-processing |
| [R7] Token budget exceeded in context injection | `format_for_injection()` enforces strict token limit with truncation |
## Open Questions
- Q1: Should Deep Dream be scheduled (cron) or event-driven (every N daily files)?
- Q2: What is the default `max_facts` limit for the core tier?
- Q3: Should the daily tier support per-user isolation (user-specific daily files) or always shared?