Files
boocode/openspec/changes/archived/2026-06-07-memory-context-engineering/design.md
indifferentketchup c935687725 chore(openspec): drop 9 superseded proposals + 11 stub archive files
Drop 9 batch proposals that are superseded by the boocode-lift-analysis
(boocontext-audit, conductor upgrades, self-healing/verify-gate skills):
add-3tier-memory, import-llm-evaluator, import-pregel-engine, plugin-platform,
conductor-evolution, code-intelligence-upgrade, dev-workflow, ui-overhaul,
agent-reliability.

Delete 11 stub archive files (49-66B each, 'Status: Shipped. Archived.' only)
that provide zero documentation value over the existing CHANGELOG.md + git tags.
2026-06-07 22:15:38 +00:00

10 KiB

Context

Current agents have no durable memory beyond the immediate LLM context window. Research across three production-grade OSS repos (LangMem, DeerFlow, CowAgent) reveals a consistent architectural pattern: a tiered memory pipeline with short-term context management, long-term semantic extraction, and periodic background consolidation. This design synthesizes those patterns into a portable, framework-agnostic memory-engine module.

The engine must be:

  • Portable — works with any LLM, any agent framework, any embedding provider
  • Tiered — separates ephemeral session context from persistent long-term knowledge
  • Efficient — background processing, debounced writes, token-budget-aware formatting
  • Searchable — hybrid keyword + vector retrieval with scoring

Goals / Non-Goals

Goals:

  • Provide a unified public API: MemoryEngine class with manage(), search(), flush(), dream() methods
  • Short-term context: token-budget windowing + incremental summarization (LangMem's summarize_messages pattern)
  • Long-term memory: LLM-extracted facts stored in SQLite with typed schemas (LangMem's MemoryManager + DeerFlow's fact model)
  • Tiered consolidation: context→daily→core pipeline with configurable promotion rules (CowAgent's 3-tier)
  • Hybrid search: FTS5 keyword + numpy-vectorized cosine similarity with weighted merge (CowAgent's MemoryStorage)
  • Background processing: debounced async queue for memory updates (DeerFlow's MemoryUpdateQueue + LangMem's ReflectionExecutor)
  • Agent tools: manage_memory(content, action, id) and search_memory(query, limit) as framework-agnostic callables

Non-Goals:

  • Not a standalone agent framework — integrates into existing loops
  • No built-in LLM provider — caller provides model
  • No built-in embedding provider — caller provides or we degrade to keyword-only
  • No real-time sync / distributed consensus — single-process design
  • No graph-based memory (entity-relationship knowledge graphs) — deferred to future

Decisions

D1: SQLite as the single persistence backend

  • Choice: SQLite with WAL mode for both keyword search (FTS5) and vector storage (BLOB embeddings)
  • Rationale: Zero-dependency, production-proven, FTS5 is stdlib-compatible, numpy integration in-process
  • Alternatives considered:
    • JSON files (DeerFlow) → simpler but no built-in search, concurrency issues
    • External vector DB (Pinecone, pgvector) → adds operational complexity, violates portability goal
    • LMDB/RocksDB → overkill, no FTS5 equivalent

D2: Three-tier architecture with file-based daily layer

  • Choice: In-memory context tier → Markdown-file daily tier → SQLite-indexed core tier
  • Rationale: Daily Markdown files are human-readable, easily audited, and serve as the input to Deep Dream consolidation. Core tier is the indexed, searchable fact store.
  • Alternatives considered:
    • Single SQLite DB for everything → loses human-readability of daily records
    • All in-memory → no persistence across restarts

D3: Fact extraction via structured LLM output (tool-calling pattern)

  • Choice: LLM returns structured JSON (DeerFlow pattern) rather than tool-calling-based extraction (LangMem trustcall pattern)
  • Rationale: Simpler, fewer dependencies, compatible with any LLM provider. LangMem's trustcall approach is more robust for complex multi-step edits but requires the trustcall library.
  • Fallback: Confidence-thresholded insertion with content-dedup hashing to prevent duplicates

D4: Hybrid search with numpy-vectorized cosine similarity

  • Choice: Load relevant embeddings from SQLite, compute cosine similarity via matrix @ vector (numpy), merge with FTS5 BM25 scores
  • Rationale: ~100x faster than per-row Python loops. Uses numpy which is near-ubiquitous in Python ML.
  • Fallback: Pure-Python cosine similarity when numpy unavailable

D5: Debounced background memory update queue

  • Choice: Thread-safe priority queue with configurable debounce timer (DeerFlow pattern)
  • Rationale: Prevents thundering-herd on LLM API during rapid conversation turns. Threaded execution avoids blocking the main agent loop.
  • Alternatives considered: asyncio queue → fine for async-only, but MemoryEngine must support sync callers

D6: Namespace isolation via tuple-based scoping

  • Choice: (scope_type, user_id, agent_id) tuple namespace for multi-tenant isolation
  • Rationale: LangMem's NamespaceTemplate pattern proven in production. Allows ("user", "u-123") or ("org", "acme", "agent-alpha").

Architecture Overview

┌─────────────────────────────────────────────────────────┐
│                    MemoryEngine                          │
├─────────────────────────────────────────────────────────┤
│  manage_memory(content, scope, metadata) → fact_id       │
│  search_memory(query, limit, scope) → SearchResults[]    │
│  flush_messages(messages, scope) → boolean               │
│  deep_dream(lookback_days, scope) → boolean              │
│  format_for_injection(scope, max_tokens) → str           │
└──────────────────────┬──────────────────────────────────┘
                       │
        ┌──────────────┼──────────────┐
        ▼              ▼              ▼
┌──────────────┐ ┌──────────┐ ┌──────────────┐
│ Context Tier  │ │ Daily    │ │  Core Tier   │
│ (in-memory)  │ │ Tier     │ │  (SQLite +   │
│              │ │ (Markdown│ │   FTS5 +     │
│ RunningSumm. │ │  files)  │ │   vectors)   │
│ token budget │ │          │ │              │
└──────────────┘ │ Deep     │ │ MemoryStore  │
                 │ Dream ───┼─┤ (facts)      │
                 └──────────┘ │ HybridSearch │
                              └──────────────┘
                                       │
                              ┌────────┴────────┐
                              ▼                  ▼
                       ┌────────────┐  ┌────────────────┐
                       │ Keyword    │  │ Vector Search   │
                       │ (FTS5)     │  │ (numpy cosine)  │
                       └────────────┘  └────────────────┘

Data Flow

  1. Agent sends message → Context tier tracks token budget, optionally summarizes
  2. Conversation turn completes → Messages queued to background MemoryUpdateQueue
  3. Debounce timer firesMemoryUpdater calls LLM with current memory + conversation → extracts facts
  4. Facts persisted → Core tier SQLite: chunks table with embedding, FTS5 index
  5. Daily recordingMemoryFlushManager appends to memory/YYYY-MM-DD.md
  6. Deep Dream (scheduled) → LLM reads MEMORY.md + recent daily files → rewrites MEMORY.md → writes dream diary
  7. Agent starts new sessionformat_for_injection() reads core tier → builds token-budgeted context string → injects into system prompt

Module Structure

memory-engine/
├── __init__.py               # Public API: MemoryEngine, MemoryConfig
├── config.py                 # Pydantic config model
├── core/
│   ├── __init__.py
│   ├── store.py              # MemoryStore (SQLite + FTS5 + vectors)
│   ├── hybrid_search.py      # Vector + keyword merge with temporal decay
│   └── schemas.py            # Memory, Fact, SearchResult models
├── extraction/
│   ├── __init__.py
│   ├── manager.py            # MemoryManager (LLM fact extraction)
│   └── prompts.py            # System prompts for memory extraction
├── tiers/
│   ├── __init__.py
│   ├── context.py            # ContextTier (short-term summarization)
│   ├── daily.py              # DailyTier (Markdown file management)
│   └── core.py               # CoreTier (long-term persistent store)
├── background/
│   ├── __init__.py
│   ├── queue.py              # MemoryUpdateQueue (debounced)
│   └── deep_dream.py         # Deep Dream consolidation
├── tools/
│   ├── __init__.py
│   ├── manage.py             # manage_memory callable
│   └── search.py             # search_memory callable
├── embedding/
│   ├── __init__.py
│   ├── base.py               # EmbeddingProvider ABC
│   └── openai.py             # OpenAI embedding implementation
└── utils/
    ├── __init__.py
    ├── namespace.py          # NamespaceTemplate
    ├── token_counter.py      # Token counting (tiktoken wrapper)
    └── chunker.py            # Text chunking

Risks / Trade-offs

Risk Mitigation
[R1] LLM extraction latency blocks agent loop Background queue with debounce — agent never waits for memory update
[R2] Embedding API failures degrade search Graceful degradation to keyword-only; vector results omitted, not fatal
[R3] SQLite write contention under high concurrency WAL mode + RLock per connection; single-process assumption
[R4] FTS5 corrupted after crash Self-healing on init: detect corrupt shadow tables, rebuild from chunks table
[R5] Memory bloat from unbounded fact accumulation Configurable max_facts limit (default 500); sorted by confidence, oldest trimmed
[R6] Deep Dream overwrites valuable long-term data Dream diary preserves audit trail; content-hash dedup prevents re-processing
[R7] Token budget exceeded in context injection format_for_injection() enforces strict token limit with truncation

Open Questions

  • Q1: Should Deep Dream be scheduled (cron) or event-driven (every N daily files)?
  • Q2: What is the default max_facts limit for the core tier?
  • Q3: Should the daily tier support per-user isolation (user-specific daily files) or always shared?