boocode/openspec/changes/archived/2026-06-07-memory-context-engineering/proposal.md

## Why

Current AI agents lack structured, durable memory beyond the immediate context window. Conversations are stateless, preferences are forgotten, and long-term learning is nonexistent. Three OSS repos (LangMem, DeerFlow, CowAgent) demonstrate production patterns for agent memory — but no unified, portable engine exists that combines short-term context management, long-term semantic memory, tiered consolidation, and hybrid retrieval. This change builds that engine by extracting and adapting the best patterns from all three.

## What Changes

- **New `memory-engine/` module** in the codebase providing a unified memory & context API
- **Short-term context summarization** — token-budget-aware conversation windowing (LangMem pattern)
- **Long-term semantic memory** — LLM-extracted facts stored with optional vector embeddings (LangMem/DeerFlow hybrid)
- **Tiered memory architecture** — Context tier (ephemeral session) → Daily tier (summarized records) → Core tier (distilled long-term) (CowAgent pattern)
- **Hybrid search** — Keyword (FTS5) + Vector (cosine similarity on embeddings) with weighted merge (CowAgent pattern)
- **Background consolidation** — Debounced, async memory extraction pipeline (DeerFlow queue + LangMem ReflectionExecutor)
- **Deep Dream distillation** — Periodic overnight LLM consolidation of daily records into core memory (CowAgent pattern)
- **Memory tools for agents** — `manage_memory` and `search_memory` tool interfaces (LangMem pattern)

## Capabilities

### New Capabilities
- `short-term-context`: Token-budget window management, conversation summarization, and context trimming for LLM interactions
- `long-term-memory`: Persistent fact extraction, storage, and retrieval with Pydantic-typed schemas
- `tiered-consolidation`: Three-tier memory pipeline (context→daily→core) with promotion rules and Deep Dream distillation
- `hybrid-search`: Combined keyword (FTS5) + vector (embedding cosine similarity) search with weighted scoring and temporal decay
- `memory-tools`: `manage_memory` (CRUD) and `search_memory` (semantic query) tools for agent integration
- `background-processing`: Debounced async memory update queue with thread-pool execution

### Modified Capabilities
<!-- No existing specs to modify — this is a greenfield module -->

## Impact

- New `memory-engine/` directory tree (no existing code modified)
- Dependencies: `sqlite3` (stdlib), `numpy` (optional, for vector search), `pydantic` (schemas), `tiktoken` (token counting)
- LLM provider integration via abstract `ChatModel` interface (not coupled to any provider)
- Embedding provider integration via abstract `EmbeddingProvider` interface (supports OpenAI, local models)
- Agent integration via simple tool interface (not coupled to any agent framework)