chore(openspec): drop 9 superseded proposals + 11 stub archive files
Drop 9 batch proposals that are superseded by the boocode-lift-analysis (boocontext-audit, conductor upgrades, self-healing/verify-gate skills): add-3tier-memory, import-llm-evaluator, import-pregel-engine, plugin-platform, conductor-evolution, code-intelligence-upgrade, dev-workflow, ui-overhaul, agent-reliability. Delete 11 stub archive files (49-66B each, 'Status: Shipped. Archived.' only) that provide zero documentation value over the existing CHANGELOG.md + git tags.
This commit is contained in:
@@ -0,0 +1,2 @@
|
||||
schema: spec-driven
|
||||
created: 2026-06-07
|
||||
@@ -0,0 +1,3 @@
|
||||
# memory-context-engineering
|
||||
|
||||
Spec-driven implementation of memory & context engineering patterns based on research of LangMem, DeerFlow, and CowAgent
|
||||
@@ -0,0 +1,164 @@
|
||||
## Context
|
||||
|
||||
Current agents have no durable memory beyond the immediate LLM context window. Research across three production-grade OSS repos (LangMem, DeerFlow, CowAgent) reveals a consistent architectural pattern: a **tiered memory pipeline** with short-term context management, long-term semantic extraction, and periodic background consolidation. This design synthesizes those patterns into a portable, framework-agnostic `memory-engine` module.
|
||||
|
||||
The engine must be:
|
||||
- **Portable** — works with any LLM, any agent framework, any embedding provider
|
||||
- **Tiered** — separates ephemeral session context from persistent long-term knowledge
|
||||
- **Efficient** — background processing, debounced writes, token-budget-aware formatting
|
||||
- **Searchable** — hybrid keyword + vector retrieval with scoring
|
||||
|
||||
## Goals / Non-Goals
|
||||
|
||||
**Goals:**
|
||||
- Provide a unified public API: `MemoryEngine` class with `manage()`, `search()`, `flush()`, `dream()` methods
|
||||
- Short-term context: token-budget windowing + incremental summarization (LangMem's `summarize_messages` pattern)
|
||||
- Long-term memory: LLM-extracted facts stored in SQLite with typed schemas (LangMem's `MemoryManager` + DeerFlow's fact model)
|
||||
- Tiered consolidation: context→daily→core pipeline with configurable promotion rules (CowAgent's 3-tier)
|
||||
- Hybrid search: FTS5 keyword + numpy-vectorized cosine similarity with weighted merge (CowAgent's `MemoryStorage`)
|
||||
- Background processing: debounced async queue for memory updates (DeerFlow's `MemoryUpdateQueue` + LangMem's `ReflectionExecutor`)
|
||||
- Agent tools: `manage_memory(content, action, id)` and `search_memory(query, limit)` as framework-agnostic callables
|
||||
|
||||
**Non-Goals:**
|
||||
- Not a standalone agent framework — integrates into existing loops
|
||||
- No built-in LLM provider — caller provides model
|
||||
- No built-in embedding provider — caller provides or we degrade to keyword-only
|
||||
- No real-time sync / distributed consensus — single-process design
|
||||
- No graph-based memory (entity-relationship knowledge graphs) — deferred to future
|
||||
|
||||
## Decisions
|
||||
|
||||
### D1: SQLite as the single persistence backend
|
||||
- **Choice**: SQLite with WAL mode for both keyword search (FTS5) and vector storage (BLOB embeddings)
|
||||
- **Rationale**: Zero-dependency, production-proven, FTS5 is stdlib-compatible, numpy integration in-process
|
||||
- **Alternatives considered**:
|
||||
- *JSON files* (DeerFlow) → simpler but no built-in search, concurrency issues
|
||||
- *External vector DB* (Pinecone, pgvector) → adds operational complexity, violates portability goal
|
||||
- *LMDB/RocksDB* → overkill, no FTS5 equivalent
|
||||
|
||||
### D2: Three-tier architecture with file-based daily layer
|
||||
- **Choice**: In-memory context tier → Markdown-file daily tier → SQLite-indexed core tier
|
||||
- **Rationale**: Daily Markdown files are human-readable, easily audited, and serve as the input to Deep Dream consolidation. Core tier is the indexed, searchable fact store.
|
||||
- **Alternatives considered**:
|
||||
- *Single SQLite DB for everything* → loses human-readability of daily records
|
||||
- *All in-memory* → no persistence across restarts
|
||||
|
||||
### D3: Fact extraction via structured LLM output (tool-calling pattern)
|
||||
- **Choice**: LLM returns structured JSON (DeerFlow pattern) rather than tool-calling-based extraction (LangMem trustcall pattern)
|
||||
- **Rationale**: Simpler, fewer dependencies, compatible with any LLM provider. LangMem's trustcall approach is more robust for complex multi-step edits but requires the `trustcall` library.
|
||||
- **Fallback**: Confidence-thresholded insertion with content-dedup hashing to prevent duplicates
|
||||
|
||||
### D4: Hybrid search with numpy-vectorized cosine similarity
|
||||
- **Choice**: Load relevant embeddings from SQLite, compute cosine similarity via `matrix @ vector` (numpy), merge with FTS5 BM25 scores
|
||||
- **Rationale**: ~100x faster than per-row Python loops. Uses numpy which is near-ubiquitous in Python ML.
|
||||
- **Fallback**: Pure-Python cosine similarity when numpy unavailable
|
||||
|
||||
### D5: Debounced background memory update queue
|
||||
- **Choice**: Thread-safe priority queue with configurable debounce timer (DeerFlow pattern)
|
||||
- **Rationale**: Prevents thundering-herd on LLM API during rapid conversation turns. Threaded execution avoids blocking the main agent loop.
|
||||
- **Alternatives considered**: asyncio queue → fine for async-only, but MemoryEngine must support sync callers
|
||||
|
||||
### D6: Namespace isolation via tuple-based scoping
|
||||
- **Choice**: `(scope_type, user_id, agent_id)` tuple namespace for multi-tenant isolation
|
||||
- **Rationale**: LangMem's `NamespaceTemplate` pattern proven in production. Allows `("user", "u-123")` or `("org", "acme", "agent-alpha")`.
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ MemoryEngine │
|
||||
├─────────────────────────────────────────────────────────┤
|
||||
│ manage_memory(content, scope, metadata) → fact_id │
|
||||
│ search_memory(query, limit, scope) → SearchResults[] │
|
||||
│ flush_messages(messages, scope) → boolean │
|
||||
│ deep_dream(lookback_days, scope) → boolean │
|
||||
│ format_for_injection(scope, max_tokens) → str │
|
||||
└──────────────────────┬──────────────────────────────────┘
|
||||
│
|
||||
┌──────────────┼──────────────┐
|
||||
▼ ▼ ▼
|
||||
┌──────────────┐ ┌──────────┐ ┌──────────────┐
|
||||
│ Context Tier │ │ Daily │ │ Core Tier │
|
||||
│ (in-memory) │ │ Tier │ │ (SQLite + │
|
||||
│ │ │ (Markdown│ │ FTS5 + │
|
||||
│ RunningSumm. │ │ files) │ │ vectors) │
|
||||
│ token budget │ │ │ │ │
|
||||
└──────────────┘ │ Deep │ │ MemoryStore │
|
||||
│ Dream ───┼─┤ (facts) │
|
||||
└──────────┘ │ HybridSearch │
|
||||
└──────────────┘
|
||||
│
|
||||
┌────────┴────────┐
|
||||
▼ ▼
|
||||
┌────────────┐ ┌────────────────┐
|
||||
│ Keyword │ │ Vector Search │
|
||||
│ (FTS5) │ │ (numpy cosine) │
|
||||
└────────────┘ └────────────────┘
|
||||
```
|
||||
|
||||
### Data Flow
|
||||
|
||||
1. **Agent sends message** → Context tier tracks token budget, optionally summarizes
|
||||
2. **Conversation turn completes** → Messages queued to background `MemoryUpdateQueue`
|
||||
3. **Debounce timer fires** → `MemoryUpdater` calls LLM with current memory + conversation → extracts facts
|
||||
4. **Facts persisted** → Core tier SQLite: chunks table with embedding, FTS5 index
|
||||
5. **Daily recording** → `MemoryFlushManager` appends to `memory/YYYY-MM-DD.md`
|
||||
6. **Deep Dream (scheduled)** → LLM reads MEMORY.md + recent daily files → rewrites MEMORY.md → writes dream diary
|
||||
7. **Agent starts new session** → `format_for_injection()` reads core tier → builds token-budgeted context string → injects into system prompt
|
||||
|
||||
## Module Structure
|
||||
|
||||
```
|
||||
memory-engine/
|
||||
├── __init__.py # Public API: MemoryEngine, MemoryConfig
|
||||
├── config.py # Pydantic config model
|
||||
├── core/
|
||||
│ ├── __init__.py
|
||||
│ ├── store.py # MemoryStore (SQLite + FTS5 + vectors)
|
||||
│ ├── hybrid_search.py # Vector + keyword merge with temporal decay
|
||||
│ └── schemas.py # Memory, Fact, SearchResult models
|
||||
├── extraction/
|
||||
│ ├── __init__.py
|
||||
│ ├── manager.py # MemoryManager (LLM fact extraction)
|
||||
│ └── prompts.py # System prompts for memory extraction
|
||||
├── tiers/
|
||||
│ ├── __init__.py
|
||||
│ ├── context.py # ContextTier (short-term summarization)
|
||||
│ ├── daily.py # DailyTier (Markdown file management)
|
||||
│ └── core.py # CoreTier (long-term persistent store)
|
||||
├── background/
|
||||
│ ├── __init__.py
|
||||
│ ├── queue.py # MemoryUpdateQueue (debounced)
|
||||
│ └── deep_dream.py # Deep Dream consolidation
|
||||
├── tools/
|
||||
│ ├── __init__.py
|
||||
│ ├── manage.py # manage_memory callable
|
||||
│ └── search.py # search_memory callable
|
||||
├── embedding/
|
||||
│ ├── __init__.py
|
||||
│ ├── base.py # EmbeddingProvider ABC
|
||||
│ └── openai.py # OpenAI embedding implementation
|
||||
└── utils/
|
||||
├── __init__.py
|
||||
├── namespace.py # NamespaceTemplate
|
||||
├── token_counter.py # Token counting (tiktoken wrapper)
|
||||
└── chunker.py # Text chunking
|
||||
```
|
||||
|
||||
## Risks / Trade-offs
|
||||
|
||||
| Risk | Mitigation |
|
||||
|------|-----------|
|
||||
| [R1] LLM extraction latency blocks agent loop | Background queue with debounce — agent never waits for memory update |
|
||||
| [R2] Embedding API failures degrade search | Graceful degradation to keyword-only; vector results omitted, not fatal |
|
||||
| [R3] SQLite write contention under high concurrency | WAL mode + RLock per connection; single-process assumption |
|
||||
| [R4] FTS5 corrupted after crash | Self-healing on init: detect corrupt shadow tables, rebuild from chunks table |
|
||||
| [R5] Memory bloat from unbounded fact accumulation | Configurable `max_facts` limit (default 500); sorted by confidence, oldest trimmed |
|
||||
| [R6] Deep Dream overwrites valuable long-term data | Dream diary preserves audit trail; content-hash dedup prevents re-processing |
|
||||
| [R7] Token budget exceeded in context injection | `format_for_injection()` enforces strict token limit with truncation |
|
||||
|
||||
## Open Questions
|
||||
|
||||
- Q1: Should Deep Dream be scheduled (cron) or event-driven (every N daily files)?
|
||||
- Q2: What is the default `max_facts` limit for the core tier?
|
||||
- Q3: Should the daily tier support per-user isolation (user-specific daily files) or always shared?
|
||||
@@ -0,0 +1,35 @@
|
||||
## Why
|
||||
|
||||
Current AI agents lack structured, durable memory beyond the immediate context window. Conversations are stateless, preferences are forgotten, and long-term learning is nonexistent. Three OSS repos (LangMem, DeerFlow, CowAgent) demonstrate production patterns for agent memory — but no unified, portable engine exists that combines short-term context management, long-term semantic memory, tiered consolidation, and hybrid retrieval. This change builds that engine by extracting and adapting the best patterns from all three.
|
||||
|
||||
## What Changes
|
||||
|
||||
- **New `memory-engine/` module** in the codebase providing a unified memory & context API
|
||||
- **Short-term context summarization** — token-budget-aware conversation windowing (LangMem pattern)
|
||||
- **Long-term semantic memory** — LLM-extracted facts stored with optional vector embeddings (LangMem/DeerFlow hybrid)
|
||||
- **Tiered memory architecture** — Context tier (ephemeral session) → Daily tier (summarized records) → Core tier (distilled long-term) (CowAgent pattern)
|
||||
- **Hybrid search** — Keyword (FTS5) + Vector (cosine similarity on embeddings) with weighted merge (CowAgent pattern)
|
||||
- **Background consolidation** — Debounced, async memory extraction pipeline (DeerFlow queue + LangMem ReflectionExecutor)
|
||||
- **Deep Dream distillation** — Periodic overnight LLM consolidation of daily records into core memory (CowAgent pattern)
|
||||
- **Memory tools for agents** — `manage_memory` and `search_memory` tool interfaces (LangMem pattern)
|
||||
|
||||
## Capabilities
|
||||
|
||||
### New Capabilities
|
||||
- `short-term-context`: Token-budget window management, conversation summarization, and context trimming for LLM interactions
|
||||
- `long-term-memory`: Persistent fact extraction, storage, and retrieval with Pydantic-typed schemas
|
||||
- `tiered-consolidation`: Three-tier memory pipeline (context→daily→core) with promotion rules and Deep Dream distillation
|
||||
- `hybrid-search`: Combined keyword (FTS5) + vector (embedding cosine similarity) search with weighted scoring and temporal decay
|
||||
- `memory-tools`: `manage_memory` (CRUD) and `search_memory` (semantic query) tools for agent integration
|
||||
- `background-processing`: Debounced async memory update queue with thread-pool execution
|
||||
|
||||
### Modified Capabilities
|
||||
<!-- No existing specs to modify — this is a greenfield module -->
|
||||
|
||||
## Impact
|
||||
|
||||
- New `memory-engine/` directory tree (no existing code modified)
|
||||
- Dependencies: `sqlite3` (stdlib), `numpy` (optional, for vector search), `pydantic` (schemas), `tiktoken` (token counting)
|
||||
- LLM provider integration via abstract `ChatModel` interface (not coupled to any provider)
|
||||
- Embedding provider integration via abstract `EmbeddingProvider` interface (supports OpenAI, local models)
|
||||
- Agent integration via simple tool interface (not coupled to any agent framework)
|
||||
@@ -0,0 +1,58 @@
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: Debounced memory update queue
|
||||
The system SHALL collect memory update requests into a queue and process them after a configurable debounce period.
|
||||
|
||||
#### Scenario: Items enqueued per (thread, user, agent) key
|
||||
- **WHEN` a conversation context is added to the queue
|
||||
- **THEN** it SHALL be keyed by `(thread_id, user_id, agent_name)` for deduplication
|
||||
- **WHEN** a second context arrives for the same key before processing
|
||||
- **THEN** the previous context SHALL be replaced with the newer one
|
||||
|
||||
#### Scenario: Debounce timer resets on each enqueue
|
||||
- **WHEN` a new item is enqueued
|
||||
- **THEN** the debounce timer SHALL reset to the configured `debounce_seconds`
|
||||
- **WHEN** no new items arrive within the debounce window
|
||||
- **THEN** the queue SHALL be processed
|
||||
|
||||
#### Scenario: Immediate processing option
|
||||
- **WHEN** `add_nowait()` is called instead of `add()`
|
||||
- **THEN** the queue SHALL start processing immediately in a background thread
|
||||
|
||||
### Requirement: Background thread execution for memory updates
|
||||
The system SHALL execute memory updates (LLM extraction + persistence) in a background thread to avoid blocking the agent loop.
|
||||
|
||||
#### Scenario: Async flush via threading.Thread
|
||||
- **WHEN` conversation messages are flushed to memory
|
||||
- **THEN** the flush SHALL run in a `threading.Thread` (daemon=True)
|
||||
- **THEN` the main agent SHALL NOT wait for the flush to complete
|
||||
|
||||
#### Scenario: Thread pool for sync LLM calls
|
||||
- **WHEN** a memory update requires a synchronous LLM call
|
||||
- **THEN** the call SHALL be offloaded to a `ThreadPoolExecutor` (max_workers=4)
|
||||
- **THEN** this SHALL prevent blocking the main event loop
|
||||
|
||||
### Requirement: Content deduplication for flush
|
||||
The system SHALL deduplicate message content before flushing to avoid redundant summarization.
|
||||
|
||||
#### Scenario: MD5 content hash dedup
|
||||
- **WHEN** messages are about to be flushed
|
||||
- **THEN** each message content SHALL be MD5-hashed
|
||||
- **WHEN** a hash matches a previously flushed message
|
||||
- **THEN** that message SHALL be skipped
|
||||
|
||||
#### Scenario: Scheduler pair stripping
|
||||
- **WHEN** messages contain scheduler-injected pairs (marked with `[SCHEDULED]` prefix)
|
||||
- **THEN** the scheduler user message and its paired assistant response SHALL be stripped before flushing
|
||||
|
||||
### Requirement: Configuration-driven memory processing
|
||||
The system SHALL support configuration to enable/disable background memory processing.
|
||||
|
||||
#### Scenario: Memory processing disabled
|
||||
- **WHEN** `memory_config.enabled` is `False`
|
||||
- **THEN** no memory updates SHALL be queued or processed
|
||||
- **THEN** queue `add()` calls SHALL be no-ops
|
||||
|
||||
#### Scenario: Rate limiting between updates
|
||||
- **WHEN** processing multiple queued memory updates
|
||||
- **THEN` a 0.5 second delay SHALL be inserted between updates to avoid LLM API rate limits
|
||||
@@ -0,0 +1,73 @@
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: Hybrid search with vector + keyword fusion
|
||||
The system SHALL combine vector similarity search and keyword search into unified ranked results.
|
||||
|
||||
#### Scenario: Vector search runs when embedding provider available
|
||||
- **WHEN** an embedding provider is configured
|
||||
- **THEN** the system SHALL compute a query embedding and perform cosine similarity search
|
||||
- **WHEN** no embedding provider is configured
|
||||
- **THEN** the system SHALL gracefully degrade to keyword-only search
|
||||
|
||||
#### Scenario: Keyword search always runs
|
||||
- **WHEN** a search query is submitted
|
||||
- **THEN** the system SHALL always perform keyword search regardless of embedding provider availability
|
||||
|
||||
#### Scenario: Weighted score merging
|
||||
- **WHEN** both vector and keyword results are available
|
||||
- **THEN** the final score SHALL be: `vector_weight * vector_score + keyword_weight * keyword_score`
|
||||
- **THEN** default weights SHALL be `vector_weight=0.7`, `keyword_weight=0.3`
|
||||
- **THEN** weights SHALL be configurable
|
||||
|
||||
### Requirement: Vector search via numpy cosine similarity
|
||||
The system SHALL perform vector search using numpy-vectorized cosine similarity for performance.
|
||||
|
||||
#### Scenario: Vectorized cosine similarity
|
||||
- **WHEN** numpy is available
|
||||
- **THEN** all chunk embeddings SHALL be loaded into a numpy matrix `(N, D)`
|
||||
- **THEN** cosine similarity SHALL be computed as `matrix @ query_vector` (BLAS matrix-vector multiply)
|
||||
- **THEN** top-K results SHALL be selected via `argpartition` (O(N) average)
|
||||
|
||||
#### Scenario: Pure-Python fallback
|
||||
- **WHEN** numpy is unavailable
|
||||
- **THEN** cosine similarity SHALL be computed per-row with pure Python
|
||||
- **THEN** results SHALL be sorted and the top K returned
|
||||
|
||||
### Requirement: Three-tier keyword search (FTS5 → trigram → LIKE)
|
||||
The system SHALL provide a cascading keyword search strategy for multi-language support.
|
||||
|
||||
#### Scenario: Standard FTS5 for ASCII queries
|
||||
- **WHEN** the query contains only ASCII characters
|
||||
- **THEN** the system SHALL use SQLite FTS5 with the unicode61 tokenizer
|
||||
- **THEN** BM25 ranking SHALL be converted to a `[0, 1)` score
|
||||
|
||||
#### Scenario: Trigram FTS5 for CJK queries
|
||||
- **WHEN** the query contains CJK (Chinese, Japanese, Korean) characters
|
||||
- **THEN** the system SHALL use SQLite FTS5 with the trigram tokenizer
|
||||
- **THEN** CJK character sequences and ASCII words SHALL be extracted and joined with AND
|
||||
|
||||
#### Scenario: LIKE fallback for edge cases
|
||||
- **WHEN** FTS5 is unavailable or returns empty results
|
||||
- **THEN** the system SHALL fall back to LIKE-based search
|
||||
- **THEN** CJK runs (1+ chars) and ASCII words (3+ chars) SHALL be matched independently
|
||||
|
||||
### Requirement: Temporal decay for dated memory files
|
||||
The system SHALL apply exponential decay to search scores for dated memory files.
|
||||
|
||||
#### Scenario: Decay applied to dated files
|
||||
- **WHEN** a memory chunk path matches `YYYY-MM-DD.md`
|
||||
- **THEN** the combined score SHALL be multiplied by `exp(-ln(2)/half_life * age_days)`
|
||||
- **THEN** the default `half_life` SHALL be 30 days
|
||||
- **WHEN** the path does not contain a date (e.g., `MEMORY.md`)
|
||||
- **THEN** no decay SHALL be applied (multiplier = 1.0)
|
||||
|
||||
### Requirement: Result filtering and limits
|
||||
The system SHALL filter search results by minimum score and maximum count.
|
||||
|
||||
#### Scenario: Min score threshold
|
||||
- **WHEN** search results are merged
|
||||
- **THEN** results with score below `min_score` (default 0.1) SHALL be discarded
|
||||
|
||||
#### Scenario: Max results limit
|
||||
- **WHEN** search results exceed `max_results`
|
||||
- **THEN** only the top `max_results` by combined score SHALL be returned
|
||||
@@ -0,0 +1,83 @@
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: Fact extraction from conversation
|
||||
The system SHALL extract structured facts from conversations using an LLM, with confidence scoring and category classification.
|
||||
|
||||
#### Scenario: Extract facts from conversation turn
|
||||
- **WHEN** a conversation turn (user message + assistant reply) is processed
|
||||
- **THEN** the system SHALL call the configured LLM with the conversation text
|
||||
- **THEN** the LLM response SHALL be parsed as structured JSON with facts
|
||||
- **THEN** each fact SHALL contain: `content`, `category`, `confidence` (0.0-1.0)
|
||||
|
||||
#### Scenario: Fact categories
|
||||
- **WHEN** a fact is extracted
|
||||
- **THEN** its `category` SHALL be one of: `preference`, `knowledge`, `context`, `behavior`, `goal`, `correction`
|
||||
- **THEN** the system SHALL validate the category against the allowed set
|
||||
|
||||
#### Scenario: Confidence thresholds
|
||||
- **WHEN** a fact's confidence is below the configurable threshold (default 0.5)
|
||||
- **THEN** the fact SHALL NOT be persisted
|
||||
- **THEN** the system SHALL log that a low-confidence fact was skipped
|
||||
|
||||
### Requirement: Fact CRUD operations
|
||||
The system SHALL support creating, reading, updating, and deleting memory facts.
|
||||
|
||||
#### Scenario: Create fact
|
||||
- **WHEN** a new fact is created
|
||||
- **THEN** it SHALL be assigned a unique ID (`fact_{uuid_hex[:8]}`)
|
||||
- **THEN** it SHALL be timestamped with ISO-8601 UTC
|
||||
- **THEN** it SHALL be persisted to the core store
|
||||
|
||||
#### Scenario: Delete fact by ID
|
||||
- **WHEN** a fact deletion is requested with a valid ID
|
||||
- **THEN** the fact SHALL be removed from the store
|
||||
- **THEN** the updated store SHALL be persisted
|
||||
|
||||
#### Scenario: Delete non-existent fact
|
||||
- **WHEN** a fact deletion is requested with an unknown ID
|
||||
- **THEN** the system SHALL raise `KeyError`
|
||||
|
||||
#### Scenario: Update fact
|
||||
- **WHEN** a fact update is requested with a valid ID
|
||||
- **THEN** the system SHALL update only the provided fields (`content`, `category`, `confidence`)
|
||||
- **THEN** the fact's `createdAt` SHALL NOT be modified
|
||||
- **THEN** the updated store SHALL be persisted
|
||||
|
||||
### Requirement: Content deduplication
|
||||
The system SHALL prevent duplicate facts by casefolded content comparison.
|
||||
|
||||
#### Scenario: Exact duplicate detected
|
||||
- **WHEN** a new fact's content (casefolded) matches an existing fact
|
||||
- **THEN** the new fact SHALL be skipped
|
||||
- **THEN** the existing fact SHALL remain unchanged
|
||||
- **THEN** the system SHALL log that a duplicate was skipped
|
||||
|
||||
#### Scenario: Near-duplicate with different casing
|
||||
- **WHEN** a new fact's content differs only in letter casing
|
||||
- **THEN** it SHALL be treated as a duplicate
|
||||
- **THEN** the new fact SHALL be skipped
|
||||
|
||||
### Requirement: Max facts limit
|
||||
The system SHALL enforce a configurable maximum number of stored facts (default 500).
|
||||
|
||||
#### Scenario: Fact count exceeds limit
|
||||
- **WHEN** adding a new fact would exceed `max_facts`
|
||||
- **THEN** the system SHALL sort existing facts by confidence (descending)
|
||||
- **THEN** the lowest-confidence fact SHALL be removed
|
||||
- **THEN** the new fact SHALL be added
|
||||
|
||||
### Requirement: Memory formatting for context injection
|
||||
The system SHALL format memory data into a compact string for injection into LLM system prompts, respecting a token budget.
|
||||
|
||||
#### Scenario: Format with all sections
|
||||
- **WHEN** memory data contains user context, history, and facts
|
||||
- **THEN** the output SHALL include: "User Context:" with work/personal/topOfMind
|
||||
- **THEN** the output SHALL include: "History:" with recent/earlier/background
|
||||
- **THEN** the output SHALL include: "Facts:" sorted by confidence descending
|
||||
- **THEN** each fact SHALL be formatted as: `- [{category} | {confidence:.2f}] {content}`
|
||||
|
||||
#### Scenario: Token budget enforcement
|
||||
- **WHEN** the formatted output exceeds `max_tokens` (default 2000)
|
||||
- **THEN** the system SHALL trim facts from lowest confidence up
|
||||
- **THEN** if still over budget, the output SHALL be truncated at the character level
|
||||
- **THEN** `"\n..."` SHALL be appended to indicate truncation
|
||||
@@ -0,0 +1,64 @@
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: manage_memory tool
|
||||
The system SHALL provide a callable tool for creating, updating, and deleting persistent facts.
|
||||
|
||||
#### Scenario: Create a new fact
|
||||
- **WHEN** `manage_memory(content="...", action="create")` is called
|
||||
- **THEN** a new fact SHALL be created with the provided content
|
||||
- **THEN** a unique ID SHALL be auto-generated
|
||||
- **THEN** the return value SHALL be `"created memory <id>"`
|
||||
|
||||
#### Scenario: Update an existing fact
|
||||
- **WHEN** `manage_memory(content="...", action="update", id="<existing-id>")` is called
|
||||
- **THEN** the fact SHALL be updated with the new content
|
||||
- **THEN** the return value SHALL be `"updated memory <id>"`
|
||||
- **WHEN** no `id` is provided for an update action
|
||||
- **THEN** a ValueError SHALL be raised
|
||||
|
||||
#### Scenario: Delete a fact
|
||||
- **WHEN** `manage_memory(action="delete", id="<existing-id>")` is called
|
||||
- **THEN** the fact SHALL be deleted
|
||||
- **THEN** the return value SHALL be `"Deleted memory <id>"`
|
||||
- **WHEN** no `id` is provided for a delete action
|
||||
- **THEN** a ValueError SHALL be raised
|
||||
|
||||
#### Scenario: Configurable permitted actions
|
||||
- **WHEN** creating the tool with `actions_permitted=("create", "update")`
|
||||
- **THEN** the delete action SHALL NOT be available
|
||||
- **THEN** attempting a delete SHALL raise a ValueError
|
||||
|
||||
#### Scenario: Custom instructions
|
||||
- **WHEN** creating the tool with custom `instructions`
|
||||
- **THEN** those instructions SHALL be included in the tool description to guide LLM usage
|
||||
|
||||
### Requirement: search_memory tool
|
||||
The system SHALL provide a callable tool for searching stored facts by semantic query.
|
||||
|
||||
#### Scenario: Text query search
|
||||
- **WHEN** `search_memory(query="preference for dark mode", limit=10)` is called
|
||||
- **THEN** the system SHALL perform hybrid search (vector + keyword)
|
||||
- **THEN** results SHALL be returned as a serialized JSON list of fact objects
|
||||
|
||||
#### Scenario: Filtered search
|
||||
- **WHEN** `search_memory(query="...", filter={"category": "preference"})` is called
|
||||
- **THEN** results SHALL be filtered to match the specified criteria
|
||||
|
||||
#### Scenario: Configurable response format
|
||||
- **WHEN** `response_format="content_and_artifact"` is configured
|
||||
- **THEN** the tool SHALL return both serialized memories and raw memory objects
|
||||
|
||||
### Requirement: Namespace isolation for multi-tenant
|
||||
The system SHALL support namespace-based isolation of memory data across users, agents, or organizations.
|
||||
|
||||
#### Scenario: Runtime namespace resolution
|
||||
- **WHEN** a memory tool is called with a configuration containing `{"user_id": "u-123"}`
|
||||
- **THEN** the namespace SHALL be resolved to `("user", "u-123")` at runtime
|
||||
- **WHEN** calling with `{"org_id": "acme", "agent_id": "alpha"}`
|
||||
- **THEN** the namespace SHALL be `("org", "acme", "alpha")`
|
||||
|
||||
#### Scenario: Namespace templating
|
||||
- **WHEN** creating memory tools with `namespace=("{user_id}", "memories")`
|
||||
- **THEN** the `{user_id}` placeholder SHALL be replaced at runtime from configuration
|
||||
- **WHEN** a required config key is missing
|
||||
- **THEN** a ConfigurationError SHALL be raised
|
||||
@@ -0,0 +1,65 @@
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: Token budget management
|
||||
The system SHALL manage LLM context window limits by tracking token usage and triggering summarization when thresholds are exceeded.
|
||||
|
||||
#### Scenario: Token threshold exceeded
|
||||
- **WHEN** cumulative message tokens exceed `max_tokens` configuration
|
||||
- **THEN** the system SHALL identify messages to summarize starting from oldest
|
||||
- **THEN** the system SHALL replace summarized messages with a `RunningSummary` object
|
||||
- **THEN** the system SHALL ensure remaining messages + summary fit within `max_tokens` budget
|
||||
|
||||
#### Scenario: Partial token budget allocation
|
||||
- **WHEN** `max_summary_tokens` is configured (default 256)
|
||||
- **THEN** the system SHALL reserve `max_summary_tokens` tokens for the summary itself
|
||||
- **THEN** remaining messages SHALL be trimmed to fit within `max_tokens - max_summary_tokens`
|
||||
|
||||
### Requirement: Incremental summarization
|
||||
The system SHALL support incremental summarization across multiple turns, tracking which messages have already been summarized to avoid redundant work.
|
||||
|
||||
#### Scenario: First summarization
|
||||
- **WHEN** no existing `RunningSummary` exists and token threshold is exceeded
|
||||
- **THEN** the system SHALL call the LLM with an initial summary prompt
|
||||
- **THEN** the system SHALL return a `RunningSummary` with `summary`, `summarized_message_ids` set, and `last_summarized_message_id`
|
||||
|
||||
#### Scenario: Subsequent summarization (append)
|
||||
- **WHEN** a `RunningSummary` exists and new messages exceed threshold
|
||||
- **THEN** the system SHALL call the LLM with the existing summary plus new messages
|
||||
- **THEN** the system SHALL extend `summarized_message_ids` with newly summarized message IDs
|
||||
- **THEN** the system SHALL update `last_summarized_message_id`
|
||||
|
||||
### Requirement: Context trimming with summarization hook
|
||||
The system SHALL provide a hook that fires before messages are discarded, allowing the daily tier to capture summarized content.
|
||||
|
||||
#### Scenario: Pre-trim flush
|
||||
- **WHEN** messages are about to be discarded (summarized)
|
||||
- **THEN** the system SHALL fire a `memory_flush_hook` with the messages being summarized
|
||||
- **THEN** the hook SHALL queue the messages for async memory extraction
|
||||
- **THEN** the main thread SHALL NOT block on memory extraction
|
||||
|
||||
### Requirement: Token counting with fallback
|
||||
The system SHALL provide accurate token counting using `tiktoken` when available, with a char-based fallback.
|
||||
|
||||
#### Scenario: tiktoken available
|
||||
- **WHEN** tiktoken package is installed
|
||||
- **THEN** the system SHALL use `tiktoken.get_encoding("cl100k_base")` for token counting
|
||||
- **THEN** token counts SHALL be accurate per OpenAI/Anthropic tokenization
|
||||
|
||||
#### Scenario: tiktoken unavailable
|
||||
- **WHEN** tiktoken is not installed
|
||||
- **THEN** the system SHALL fall back to character-based estimation: `len(text) // 4`
|
||||
- **THEN** the system SHALL log a warning about missing tiktoken
|
||||
|
||||
### Requirement: Summarization node for LangGraph
|
||||
The system SHALL provide a `SummarizationNode` Runnable that integrates into LangGraph state graphs.
|
||||
|
||||
#### Scenario: Graph integration
|
||||
- **WHEN** `SummarizationNode` is added to a LangGraph workflow
|
||||
- **THEN** it SHALL read messages from `input_messages_key` (default "messages")
|
||||
- **THEN** it SHALL write updated messages to `output_messages_key` (default "summarized_messages")
|
||||
- **THEN** it SHALL store `RunningSummary` in `context.running_summary`
|
||||
|
||||
#### Scenario: Equality of input/output keys
|
||||
- **WHEN** `input_messages_key` equals `output_messages_key`
|
||||
- **THEN** the node SHALL emit a `RemoveMessage(REMOVE_ALL_MESSAGES)` to clear previous state
|
||||
- **THEN** the node SHALL write the new message list including the summary
|
||||
@@ -0,0 +1,64 @@
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: Three-tier memory architecture
|
||||
The system SHALL maintain three tiers of memory: Context (short-term/ephemeral), Daily (medium-term/file-based), and Core (long-term/distilled).
|
||||
|
||||
#### Scenario: Context tier stores active session
|
||||
- **WHEN** an agent conversation is in progress
|
||||
- **THEN** the context tier SHALL track messages, token usage, and running summary
|
||||
- **WHEN** the session ends or context is trimmed
|
||||
- **THEN** the context SHALL be flushed to the daily tier
|
||||
|
||||
#### Scenario: Daily tier persists as Markdown files
|
||||
- **WHEN** context is flushed
|
||||
- **THEN** the daily tier SHALL append summarized records to `memory/YYYY-MM-DD.md`
|
||||
- **THEN** each session block SHALL have a timestamped header (e.g., `## Trimmed Context (14:30)`)
|
||||
- **THEN** daily files SHALL be created lazily (only when first write occurs)
|
||||
|
||||
#### Scenario: Core tier stores distilled long-term knowledge
|
||||
- **WHEN** Deep Dream consolidation runs
|
||||
- **THEN** the core tier SHALL be updated by rewriting `MEMORY.md`
|
||||
- **THEN** `MEMORY.md` SHALL be formatted as Markdown with `- ` bullet items, optionally grouped under `## headings`
|
||||
|
||||
### Requirement: Daily memory file management
|
||||
The system SHALL manage daily memory files with automatic creation and lazy initialization.
|
||||
|
||||
#### Scenario: Lazy file creation
|
||||
- **WHEN** the first memory write occurs for a given day
|
||||
- **THEN** a file SHALL be created at `memory/YYYY-MM-DD.md` with a header `# Daily Memory: YYYY-MM-DD`
|
||||
|
||||
#### Scenario: Append-only writes
|
||||
- **WHEN** subsequent memory writes occur on the same day
|
||||
- **THEN** new entries SHALL be appended to the existing daily file
|
||||
|
||||
### Requirement: Deep Dream consolidation
|
||||
The system SHALL periodically consolidate daily memories into the core memory using LLM-based distillation.
|
||||
|
||||
#### Scenario: Deep Dream triggered
|
||||
- **WHEN** `deep_dream(lookback_days=N)` is called
|
||||
- **THEN** the system SHALL read current `MEMORY.md` and the last N daily files
|
||||
- **THEN** the LLM SHALL receive both the current memory and daily records
|
||||
- **THEN** the LLM SHALL return `[MEMORY]` and `[DREAM]` sections
|
||||
- **THEN** `MEMORY.md` SHALL be overwritten with the `[MEMORY]` content
|
||||
- **THEN** a dream diary SHALL be written to `memory/dreams/YYYY-MM-DD.md`
|
||||
|
||||
#### Scenario: Dedup prevents redundant runs
|
||||
- **WHEN** Deep Dream is called but daily content hash matches the last processed hash
|
||||
- **THEN** the operation SHALL be skipped
|
||||
|
||||
#### Scenario: No daily content skips gracefully
|
||||
- **WHEN** Deep Dream is called but no recent daily files have content
|
||||
- **THEN** the operation SHALL be skipped and existing `MEMORY.md` SHALL be preserved
|
||||
|
||||
#### Scenario: No-fabrication constraint
|
||||
- **WHEN** the LLM produces the `[MEMORY]` section
|
||||
- **THEN** it SHALL ONLY use information present in the source materials (current MEMORY.md + daily files)
|
||||
- **THEN** it SHALL NOT fabricate, infer, or add information not present in the source
|
||||
|
||||
### Requirement: Context summary injection
|
||||
The system SHALL support injecting daily summary text into the active message list for context continuity.
|
||||
|
||||
#### Scenario: Context summary callback
|
||||
- **WHEN** a daily memory flush completes
|
||||
- **THEN** an optional callback SHALL be invoked with the daily summary text
|
||||
- **THEN** the caller MAY inject the summary into the message list for continued context awareness
|
||||
@@ -0,0 +1,86 @@
|
||||
## 1. Module Scaffold & Data Schemas
|
||||
|
||||
- [x] 1.1 Create `memory-engine/` directory tree with all subdirectories and `__init__.py` files
|
||||
- [x] 1.2 Create `config.py` with `MemoryConfig` pydantic model (embedding, chunking, search, tier settings)
|
||||
- [x] 1.3 Create `core/schemas.py` with `MemoryChunk`, `SearchResult`, `Fact`, `RunningSummary`, `ExtractedMemory` data classes
|
||||
- [x] 1.4 Create `utils/token_counter.py` with tiktoken + char-fallback token counting
|
||||
- [x] 1.5 Create `utils/namespace.py` with `NamespaceTemplate` for runtime namespace resolution
|
||||
- [x] 1.6 Create `utils/chunker.py` with `TextChunker` (line-based, overlapping, configurable max_tokens)
|
||||
|
||||
## 2. Core Store: SQLite + FTS5 + Vector
|
||||
|
||||
- [x] 2.1 Create `core/store.py` with `MemoryStore` — SQLite init with WAL mode, FTS5 tables, integrity checks
|
||||
- [x] 2.2 Implement `create_chunks_table()` with embedding BLOB storage, indexes, meta table
|
||||
- [x] 2.3 Implement `create_fts5_tables()` with standard unicode61 tokenizer + trigram tokenizer for CJK
|
||||
- [x] 2.4 Implement FTS5 triggers (AFTER INSERT/UPDATE/DELETE) for auto-sync
|
||||
- [x] 2.5 Implement `save_chunk()` / `save_chunks_batch()` with SQLite UPSERT (INSERT ... ON CONFLICT DO UPDATE)
|
||||
- [x] 2.6 Implement `delete_by_path()`, `get_file_hash()`, `update_file_metadata()`
|
||||
- [x] 2.7 Implement FTS5 self-healing: `_fts5_state_inconsistent()`, `_fts5_shadow_corrupt()`, `reset_fts5()`
|
||||
- [x] 2.8 Implement embedding encode/decode (float32 BLOB via numpy, struct fallback, legacy JSON fallback)
|
||||
- [x] 2.9 Implement `get_stats()` and `close()` methods
|
||||
|
||||
## 3. Hybrid Search
|
||||
|
||||
- [x] 3.1 Implement `search_vector()` — numpy matrix cosine similarity with argpartition top-K (pure-Python fallback)
|
||||
- [x] 3.2 Implement FTS5 keyword search with BM25 scoring: `_search_fts5()`, `_search_fts5_trigram()`
|
||||
- [x] 3.3 Implement `_search_like()` — CJK (1+ chars) + ASCII word (3+ chars) with dynamic scoring
|
||||
- [x] 3.4 Implement `search_keyword()` — three-tier strategy (FTS5 → trigram FTS5 → LIKE)
|
||||
- [x] 3.5 Implement BM25 rank to score conversion (`0.3 + 0.69 * abs(r)/(1+abs(r))`)
|
||||
- [x] 3.6 Create `core/hybrid_search.py` with weighted merge (vector_weight, keyword_weight) + temporal decay
|
||||
- [x] 3.7 Implement `_compute_temporal_decay(path, half_life=30)` — exponential decay for dated files
|
||||
|
||||
## 4. LLM Memory Extraction
|
||||
|
||||
- [x] 4.1 Create `extraction/prompts.py` with memory update system prompt (structured JSON output)
|
||||
- [x] 4.2 Create `extraction/manager.py` with `MemoryUpdater` — LLM fact extraction from conversation
|
||||
- [x] 4.3 Implement `_prepare_update_prompt()` — loads current memory, formats conversation, builds prompt
|
||||
- [x] 4.4 Implement `_parse_memory_update_response()` — JSON extraction from LLM response (handles fences/thinking)
|
||||
- [x] 4.5 Implement `_apply_updates()` — update user/history sections, add/remove facts, enforce max_facts
|
||||
- [x] 4.6 Implement `create_fact()`, `update_fact()`, `delete_memory_fact()` CRUD operations
|
||||
- [x] 4.7 Implement content deduplication (casefold comparison) and confidence threshold filtering
|
||||
- [x] 4.8 Implement upload-mention scrubbing from memory data
|
||||
|
||||
## 5. Tiered Consolidation
|
||||
|
||||
- [x] 5.1 Create `tiers/daily.py` with `DailyTier` — lazy file creation, append-only writes with timestamped headers
|
||||
- [x] 5.2 Create `tiers/context.py` with `ContextTier` — short-term context window management with RunningSummary
|
||||
- [x] 5.3 Create `tiers/core.py` with `CoreTier` — wraps MemoryStore, manages MEMORY.md file
|
||||
- [x] 5.4 Create `tiers/__init__.py` with `flush_messages()` — context summarization + daily file append
|
||||
- [x] 5.5 Implement incremental summarization (initial summary, extend existing, RunningSummary tracking)
|
||||
- [x] 5.6 Create `background/deep_dream.py` with `DeepDream` — LLM-based MEMORY.md consolidation
|
||||
- [x] 5.7 Implement Deep Dream dedup (content-hash check), dream diary writing, empty-output guard
|
||||
|
||||
## 6. Background Processing Queue
|
||||
|
||||
- [x] 6.1 Create `background/queue.py` with `MemoryUpdateQueue` — thread-safe, debounced, keyed by (thread, user, agent)
|
||||
- [x] 6.2 Implement `add()` with debounce timer reset, `add_nowait()` for immediate processing
|
||||
- [x] 6.3 Implement timer-triggered processing with rate limiting between updates
|
||||
- [x] 6.4 Implement signal detection: `detect_correction()`, `detect_reinforcement()` with pattern matching
|
||||
- [x] 6.5 Create `background/__init__.py` with `flush_messages()` — dedup + background thread LLM summarization
|
||||
- [x] 6.6 Support `context_summary_callback` for in-context injection of summaries
|
||||
|
||||
## 7. Agent Tools & Public API
|
||||
|
||||
- [x] 7.1 Create `tools/manage.py` with `manage_memory()` — create/update/delete facts with namespace isolation
|
||||
- [x] 7.2 Create `tools/search.py` with `search_memory()` — hybrid search with query/filter/limit/offset
|
||||
- [x] 7.3 Implement `__init__.py` with `MemoryEngine` unified class: `manage()`, `search()`, `flush()`, `dream()`, `format_for_injection()`
|
||||
- [x] 7.4 Implement `format_for_injection()` — token-budgeted memory string for system prompts
|
||||
- [x] 7.5 Thread-safe singleton pattern for `MemoryUpdateQueue` and `MemoryStore`
|
||||
|
||||
## 8. Embedding Provider Interface
|
||||
|
||||
- [x] 8.1 Create `embedding/base.py` with `EmbeddingProvider` ABC — `embed_query()`, `embed_batch()`
|
||||
- [x] 8.2 Create `embedding/openai.py` with `OpenAIEmbeddingProvider` implementation
|
||||
- [x] 8.3 Implement `EmbeddingCache` — per-session cache keyed by (provider, model, text_hash)
|
||||
- [x] 8.4 Create `embedding/__init__.py` with `create_embedding_provider()` factory
|
||||
|
||||
## 9. Integration Tests
|
||||
|
||||
- [x] 9.1 Test short-term context summarization with token budget enforcement
|
||||
- [x] 9.2 Test long-term fact extraction with LLM mock
|
||||
- [x] 9.3 Test hybrid search: vector-only, keyword-only, and combined
|
||||
- [x] 9.4 Test tiered consolidation: flush → daily file → Deep Dream → MEMORY.md rewrite
|
||||
- [x] 9.5 Test background queue: debounce, dedup, async execution
|
||||
- [x] 9.6 Test namespace isolation: scoped searches across tenants
|
||||
- [x] 9.7 Test graceful degradation: no embeddings → keyword-only, no numpy → Python fallback
|
||||
- [x] 9.8 Test memory tools: create/update/delete/search round-trip
|
||||
Reference in New Issue
Block a user