Drop 9 batch proposals that are superseded by the boocode-lift-analysis (boocontext-audit, conductor upgrades, self-healing/verify-gate skills): add-3tier-memory, import-llm-evaluator, import-pregel-engine, plugin-platform, conductor-evolution, code-intelligence-upgrade, dev-workflow, ui-overhaul, agent-reliability. Delete 11 stub archive files (49-66B each, 'Status: Shipped. Archived.' only) that provide zero documentation value over the existing CHANGELOG.md + git tags.
10 KiB
10 KiB
Context
Current agents have no durable memory beyond the immediate LLM context window. Research across three production-grade OSS repos (LangMem, DeerFlow, CowAgent) reveals a consistent architectural pattern: a tiered memory pipeline with short-term context management, long-term semantic extraction, and periodic background consolidation. This design synthesizes those patterns into a portable, framework-agnostic memory-engine module.
The engine must be:
- Portable — works with any LLM, any agent framework, any embedding provider
- Tiered — separates ephemeral session context from persistent long-term knowledge
- Efficient — background processing, debounced writes, token-budget-aware formatting
- Searchable — hybrid keyword + vector retrieval with scoring
Goals / Non-Goals
Goals:
- Provide a unified public API:
MemoryEngineclass withmanage(),search(),flush(),dream()methods - Short-term context: token-budget windowing + incremental summarization (LangMem's
summarize_messagespattern) - Long-term memory: LLM-extracted facts stored in SQLite with typed schemas (LangMem's
MemoryManager+ DeerFlow's fact model) - Tiered consolidation: context→daily→core pipeline with configurable promotion rules (CowAgent's 3-tier)
- Hybrid search: FTS5 keyword + numpy-vectorized cosine similarity with weighted merge (CowAgent's
MemoryStorage) - Background processing: debounced async queue for memory updates (DeerFlow's
MemoryUpdateQueue+ LangMem'sReflectionExecutor) - Agent tools:
manage_memory(content, action, id)andsearch_memory(query, limit)as framework-agnostic callables
Non-Goals:
- Not a standalone agent framework — integrates into existing loops
- No built-in LLM provider — caller provides model
- No built-in embedding provider — caller provides or we degrade to keyword-only
- No real-time sync / distributed consensus — single-process design
- No graph-based memory (entity-relationship knowledge graphs) — deferred to future
Decisions
D1: SQLite as the single persistence backend
- Choice: SQLite with WAL mode for both keyword search (FTS5) and vector storage (BLOB embeddings)
- Rationale: Zero-dependency, production-proven, FTS5 is stdlib-compatible, numpy integration in-process
- Alternatives considered:
- JSON files (DeerFlow) → simpler but no built-in search, concurrency issues
- External vector DB (Pinecone, pgvector) → adds operational complexity, violates portability goal
- LMDB/RocksDB → overkill, no FTS5 equivalent
D2: Three-tier architecture with file-based daily layer
- Choice: In-memory context tier → Markdown-file daily tier → SQLite-indexed core tier
- Rationale: Daily Markdown files are human-readable, easily audited, and serve as the input to Deep Dream consolidation. Core tier is the indexed, searchable fact store.
- Alternatives considered:
- Single SQLite DB for everything → loses human-readability of daily records
- All in-memory → no persistence across restarts
D3: Fact extraction via structured LLM output (tool-calling pattern)
- Choice: LLM returns structured JSON (DeerFlow pattern) rather than tool-calling-based extraction (LangMem trustcall pattern)
- Rationale: Simpler, fewer dependencies, compatible with any LLM provider. LangMem's trustcall approach is more robust for complex multi-step edits but requires the
trustcalllibrary. - Fallback: Confidence-thresholded insertion with content-dedup hashing to prevent duplicates
D4: Hybrid search with numpy-vectorized cosine similarity
- Choice: Load relevant embeddings from SQLite, compute cosine similarity via
matrix @ vector(numpy), merge with FTS5 BM25 scores - Rationale: ~100x faster than per-row Python loops. Uses numpy which is near-ubiquitous in Python ML.
- Fallback: Pure-Python cosine similarity when numpy unavailable
D5: Debounced background memory update queue
- Choice: Thread-safe priority queue with configurable debounce timer (DeerFlow pattern)
- Rationale: Prevents thundering-herd on LLM API during rapid conversation turns. Threaded execution avoids blocking the main agent loop.
- Alternatives considered: asyncio queue → fine for async-only, but MemoryEngine must support sync callers
D6: Namespace isolation via tuple-based scoping
- Choice:
(scope_type, user_id, agent_id)tuple namespace for multi-tenant isolation - Rationale: LangMem's
NamespaceTemplatepattern proven in production. Allows("user", "u-123")or("org", "acme", "agent-alpha").
Architecture Overview
┌─────────────────────────────────────────────────────────┐
│ MemoryEngine │
├─────────────────────────────────────────────────────────┤
│ manage_memory(content, scope, metadata) → fact_id │
│ search_memory(query, limit, scope) → SearchResults[] │
│ flush_messages(messages, scope) → boolean │
│ deep_dream(lookback_days, scope) → boolean │
│ format_for_injection(scope, max_tokens) → str │
└──────────────────────┬──────────────────────────────────┘
│
┌──────────────┼──────────────┐
▼ ▼ ▼
┌──────────────┐ ┌──────────┐ ┌──────────────┐
│ Context Tier │ │ Daily │ │ Core Tier │
│ (in-memory) │ │ Tier │ │ (SQLite + │
│ │ │ (Markdown│ │ FTS5 + │
│ RunningSumm. │ │ files) │ │ vectors) │
│ token budget │ │ │ │ │
└──────────────┘ │ Deep │ │ MemoryStore │
│ Dream ───┼─┤ (facts) │
└──────────┘ │ HybridSearch │
└──────────────┘
│
┌────────┴────────┐
▼ ▼
┌────────────┐ ┌────────────────┐
│ Keyword │ │ Vector Search │
│ (FTS5) │ │ (numpy cosine) │
└────────────┘ └────────────────┘
Data Flow
- Agent sends message → Context tier tracks token budget, optionally summarizes
- Conversation turn completes → Messages queued to background
MemoryUpdateQueue - Debounce timer fires →
MemoryUpdatercalls LLM with current memory + conversation → extracts facts - Facts persisted → Core tier SQLite: chunks table with embedding, FTS5 index
- Daily recording →
MemoryFlushManagerappends tomemory/YYYY-MM-DD.md - Deep Dream (scheduled) → LLM reads MEMORY.md + recent daily files → rewrites MEMORY.md → writes dream diary
- Agent starts new session →
format_for_injection()reads core tier → builds token-budgeted context string → injects into system prompt
Module Structure
memory-engine/
├── __init__.py # Public API: MemoryEngine, MemoryConfig
├── config.py # Pydantic config model
├── core/
│ ├── __init__.py
│ ├── store.py # MemoryStore (SQLite + FTS5 + vectors)
│ ├── hybrid_search.py # Vector + keyword merge with temporal decay
│ └── schemas.py # Memory, Fact, SearchResult models
├── extraction/
│ ├── __init__.py
│ ├── manager.py # MemoryManager (LLM fact extraction)
│ └── prompts.py # System prompts for memory extraction
├── tiers/
│ ├── __init__.py
│ ├── context.py # ContextTier (short-term summarization)
│ ├── daily.py # DailyTier (Markdown file management)
│ └── core.py # CoreTier (long-term persistent store)
├── background/
│ ├── __init__.py
│ ├── queue.py # MemoryUpdateQueue (debounced)
│ └── deep_dream.py # Deep Dream consolidation
├── tools/
│ ├── __init__.py
│ ├── manage.py # manage_memory callable
│ └── search.py # search_memory callable
├── embedding/
│ ├── __init__.py
│ ├── base.py # EmbeddingProvider ABC
│ └── openai.py # OpenAI embedding implementation
└── utils/
├── __init__.py
├── namespace.py # NamespaceTemplate
├── token_counter.py # Token counting (tiktoken wrapper)
└── chunker.py # Text chunking
Risks / Trade-offs
| Risk | Mitigation |
|---|---|
| [R1] LLM extraction latency blocks agent loop | Background queue with debounce — agent never waits for memory update |
| [R2] Embedding API failures degrade search | Graceful degradation to keyword-only; vector results omitted, not fatal |
| [R3] SQLite write contention under high concurrency | WAL mode + RLock per connection; single-process assumption |
| [R4] FTS5 corrupted after crash | Self-healing on init: detect corrupt shadow tables, rebuild from chunks table |
| [R5] Memory bloat from unbounded fact accumulation | Configurable max_facts limit (default 500); sorted by confidence, oldest trimmed |
| [R6] Deep Dream overwrites valuable long-term data | Dream diary preserves audit trail; content-hash dedup prevents re-processing |
| [R7] Token budget exceeded in context injection | format_for_injection() enforces strict token limit with truncation |
Open Questions
- Q1: Should Deep Dream be scheduled (cron) or event-driven (every N daily files)?
- Q2: What is the default
max_factslimit for the core tier? - Q3: Should the daily tier support per-user isolation (user-specific daily files) or always shared?