chore(openspec): drop 9 superseded proposals + 11 stub archive files

Drop 9 batch proposals that are superseded by the boocode-lift-analysis (boocontext-audit, conductor upgrades, self-healing/verify-gate skills): add-3tier-memory, import-llm-evaluator, import-pregel-engine, plugin-platform, conductor-evolution, code-intelligence-upgrade, dev-workflow, ui-overhaul, agent-reliability. Delete 11 stub archive files (49-66B each, 'Status: Shipped. Archived.' only) that provide zero documentation value over the existing CHANGELOG.md + git tags.
2026-06-07 22:15:38 +00:00
parent 0d6e9a2413
commit c935687725
119 changed files with 4897 additions and 45 deletions
--- a/openspec/changes/archived/2026-06-07-memory-context-engineering/specs/background-processing/spec.md
+++ b/openspec/changes/archived/2026-06-07-memory-context-engineering/specs/background-processing/spec.md
@@ -0,0 +1,58 @@
+## ADDED Requirements
+
+### Requirement: Debounced memory update queue
+The system SHALL collect memory update requests into a queue and process them after a configurable debounce period.
+
+#### Scenario: Items enqueued per (thread, user, agent) key
+- **WHEN` a conversation context is added to the queue
+- **THEN** it SHALL be keyed by `(thread_id, user_id, agent_name)` for deduplication
+- **WHEN** a second context arrives for the same key before processing
+- **THEN** the previous context SHALL be replaced with the newer one
+
+#### Scenario: Debounce timer resets on each enqueue
+- **WHEN` a new item is enqueued
+- **THEN** the debounce timer SHALL reset to the configured `debounce_seconds`
+- **WHEN** no new items arrive within the debounce window
+- **THEN** the queue SHALL be processed
+
+#### Scenario: Immediate processing option
+- **WHEN** `add_nowait()` is called instead of `add()`
+- **THEN** the queue SHALL start processing immediately in a background thread
+
+### Requirement: Background thread execution for memory updates
+The system SHALL execute memory updates (LLM extraction + persistence) in a background thread to avoid blocking the agent loop.
+
+#### Scenario: Async flush via threading.Thread
+- **WHEN` conversation messages are flushed to memory
+- **THEN** the flush SHALL run in a `threading.Thread` (daemon=True)
+- **THEN` the main agent SHALL NOT wait for the flush to complete
+
+#### Scenario: Thread pool for sync LLM calls
+- **WHEN** a memory update requires a synchronous LLM call
+- **THEN** the call SHALL be offloaded to a `ThreadPoolExecutor` (max_workers=4)
+- **THEN** this SHALL prevent blocking the main event loop
+
+### Requirement: Content deduplication for flush
+The system SHALL deduplicate message content before flushing to avoid redundant summarization.
+
+#### Scenario: MD5 content hash dedup
+- **WHEN** messages are about to be flushed
+- **THEN** each message content SHALL be MD5-hashed
+- **WHEN** a hash matches a previously flushed message
+- **THEN** that message SHALL be skipped
+
+#### Scenario: Scheduler pair stripping
+- **WHEN** messages contain scheduler-injected pairs (marked with `[SCHEDULED]` prefix)
+- **THEN** the scheduler user message and its paired assistant response SHALL be stripped before flushing
+
+### Requirement: Configuration-driven memory processing
+The system SHALL support configuration to enable/disable background memory processing.
+
+#### Scenario: Memory processing disabled
+- **WHEN** `memory_config.enabled` is `False`
+- **THEN** no memory updates SHALL be queued or processed
+- **THEN** queue `add()` calls SHALL be no-ops
+
+#### Scenario: Rate limiting between updates
+- **WHEN** processing multiple queued memory updates
+- **THEN` a 0.5 second delay SHALL be inserted between updates to avoid LLM API rate limits
--- a/openspec/changes/archived/2026-06-07-memory-context-engineering/specs/hybrid-search/spec.md
+++ b/openspec/changes/archived/2026-06-07-memory-context-engineering/specs/hybrid-search/spec.md
@@ -0,0 +1,73 @@
+## ADDED Requirements
+
+### Requirement: Hybrid search with vector + keyword fusion
+The system SHALL combine vector similarity search and keyword search into unified ranked results.
+
+#### Scenario: Vector search runs when embedding provider available
+- **WHEN** an embedding provider is configured
+- **THEN** the system SHALL compute a query embedding and perform cosine similarity search
+- **WHEN** no embedding provider is configured
+- **THEN** the system SHALL gracefully degrade to keyword-only search
+
+#### Scenario: Keyword search always runs
+- **WHEN** a search query is submitted
+- **THEN** the system SHALL always perform keyword search regardless of embedding provider availability
+
+#### Scenario: Weighted score merging
+- **WHEN** both vector and keyword results are available
+- **THEN** the final score SHALL be: `vector_weight * vector_score + keyword_weight * keyword_score`
+- **THEN** default weights SHALL be `vector_weight=0.7`, `keyword_weight=0.3`
+- **THEN** weights SHALL be configurable
+
+### Requirement: Vector search via numpy cosine similarity
+The system SHALL perform vector search using numpy-vectorized cosine similarity for performance.
+
+#### Scenario: Vectorized cosine similarity
+- **WHEN** numpy is available
+- **THEN** all chunk embeddings SHALL be loaded into a numpy matrix `(N, D)`
+- **THEN** cosine similarity SHALL be computed as `matrix @ query_vector` (BLAS matrix-vector multiply)
+- **THEN** top-K results SHALL be selected via `argpartition` (O(N) average)
+
+#### Scenario: Pure-Python fallback
+- **WHEN** numpy is unavailable
+- **THEN** cosine similarity SHALL be computed per-row with pure Python
+- **THEN** results SHALL be sorted and the top K returned
+
+### Requirement: Three-tier keyword search (FTS5 → trigram → LIKE)
+The system SHALL provide a cascading keyword search strategy for multi-language support.
+
+#### Scenario: Standard FTS5 for ASCII queries
+- **WHEN** the query contains only ASCII characters
+- **THEN** the system SHALL use SQLite FTS5 with the unicode61 tokenizer
+- **THEN** BM25 ranking SHALL be converted to a `[0, 1)` score
+
+#### Scenario: Trigram FTS5 for CJK queries
+- **WHEN** the query contains CJK (Chinese, Japanese, Korean) characters
+- **THEN** the system SHALL use SQLite FTS5 with the trigram tokenizer
+- **THEN** CJK character sequences and ASCII words SHALL be extracted and joined with AND
+
+#### Scenario: LIKE fallback for edge cases
+- **WHEN** FTS5 is unavailable or returns empty results
+- **THEN** the system SHALL fall back to LIKE-based search
+- **THEN** CJK runs (1+ chars) and ASCII words (3+ chars) SHALL be matched independently
+
+### Requirement: Temporal decay for dated memory files
+The system SHALL apply exponential decay to search scores for dated memory files.
+
+#### Scenario: Decay applied to dated files
+- **WHEN** a memory chunk path matches `YYYY-MM-DD.md`
+- **THEN** the combined score SHALL be multiplied by `exp(-ln(2)/half_life * age_days)`
+- **THEN** the default `half_life` SHALL be 30 days
+- **WHEN** the path does not contain a date (e.g., `MEMORY.md`)
+- **THEN** no decay SHALL be applied (multiplier = 1.0)
+
+### Requirement: Result filtering and limits
+The system SHALL filter search results by minimum score and maximum count.
+
+#### Scenario: Min score threshold
+- **WHEN** search results are merged
+- **THEN** results with score below `min_score` (default 0.1) SHALL be discarded
+
+#### Scenario: Max results limit
+- **WHEN** search results exceed `max_results`
+- **THEN** only the top `max_results` by combined score SHALL be returned
--- a/openspec/changes/archived/2026-06-07-memory-context-engineering/specs/long-term-memory/spec.md
+++ b/openspec/changes/archived/2026-06-07-memory-context-engineering/specs/long-term-memory/spec.md
@@ -0,0 +1,83 @@
+## ADDED Requirements
+
+### Requirement: Fact extraction from conversation
+The system SHALL extract structured facts from conversations using an LLM, with confidence scoring and category classification.
+
+#### Scenario: Extract facts from conversation turn
+- **WHEN** a conversation turn (user message + assistant reply) is processed
+- **THEN** the system SHALL call the configured LLM with the conversation text
+- **THEN** the LLM response SHALL be parsed as structured JSON with facts
+- **THEN** each fact SHALL contain: `content`, `category`, `confidence` (0.0-1.0)
+
+#### Scenario: Fact categories
+- **WHEN** a fact is extracted
+- **THEN** its `category` SHALL be one of: `preference`, `knowledge`, `context`, `behavior`, `goal`, `correction`
+- **THEN** the system SHALL validate the category against the allowed set
+
+#### Scenario: Confidence thresholds
+- **WHEN** a fact's confidence is below the configurable threshold (default 0.5)
+- **THEN** the fact SHALL NOT be persisted
+- **THEN** the system SHALL log that a low-confidence fact was skipped
+
+### Requirement: Fact CRUD operations
+The system SHALL support creating, reading, updating, and deleting memory facts.
+
+#### Scenario: Create fact
+- **WHEN** a new fact is created
+- **THEN** it SHALL be assigned a unique ID (`fact_{uuid_hex[:8]}`)
+- **THEN** it SHALL be timestamped with ISO-8601 UTC
+- **THEN** it SHALL be persisted to the core store
+
+#### Scenario: Delete fact by ID
+- **WHEN** a fact deletion is requested with a valid ID
+- **THEN** the fact SHALL be removed from the store
+- **THEN** the updated store SHALL be persisted
+
+#### Scenario: Delete non-existent fact
+- **WHEN** a fact deletion is requested with an unknown ID
+- **THEN** the system SHALL raise `KeyError`
+
+#### Scenario: Update fact
+- **WHEN** a fact update is requested with a valid ID
+- **THEN** the system SHALL update only the provided fields (`content`, `category`, `confidence`)
+- **THEN** the fact's `createdAt` SHALL NOT be modified
+- **THEN** the updated store SHALL be persisted
+
+### Requirement: Content deduplication
+The system SHALL prevent duplicate facts by casefolded content comparison.
+
+#### Scenario: Exact duplicate detected
+- **WHEN** a new fact's content (casefolded) matches an existing fact
+- **THEN** the new fact SHALL be skipped
+- **THEN** the existing fact SHALL remain unchanged
+- **THEN** the system SHALL log that a duplicate was skipped
+
+#### Scenario: Near-duplicate with different casing
+- **WHEN** a new fact's content differs only in letter casing
+- **THEN** it SHALL be treated as a duplicate
+- **THEN** the new fact SHALL be skipped
+
+### Requirement: Max facts limit
+The system SHALL enforce a configurable maximum number of stored facts (default 500).
+
+#### Scenario: Fact count exceeds limit
+- **WHEN** adding a new fact would exceed `max_facts`
+- **THEN** the system SHALL sort existing facts by confidence (descending)
+- **THEN** the lowest-confidence fact SHALL be removed
+- **THEN** the new fact SHALL be added
+
+### Requirement: Memory formatting for context injection
+The system SHALL format memory data into a compact string for injection into LLM system prompts, respecting a token budget.
+
+#### Scenario: Format with all sections
+- **WHEN** memory data contains user context, history, and facts
+- **THEN** the output SHALL include: "User Context:" with work/personal/topOfMind
+- **THEN** the output SHALL include: "History:" with recent/earlier/background
+- **THEN** the output SHALL include: "Facts:" sorted by confidence descending
+- **THEN** each fact SHALL be formatted as: `- [{category} | {confidence:.2f}] {content}`
+
+#### Scenario: Token budget enforcement
+- **WHEN** the formatted output exceeds `max_tokens` (default 2000)
+- **THEN** the system SHALL trim facts from lowest confidence up
+- **THEN** if still over budget, the output SHALL be truncated at the character level
+- **THEN** `"\n..."` SHALL be appended to indicate truncation
--- a/openspec/changes/archived/2026-06-07-memory-context-engineering/specs/memory-tools/spec.md
+++ b/openspec/changes/archived/2026-06-07-memory-context-engineering/specs/memory-tools/spec.md
@@ -0,0 +1,64 @@
+## ADDED Requirements
+
+### Requirement: manage_memory tool
+The system SHALL provide a callable tool for creating, updating, and deleting persistent facts.
+
+#### Scenario: Create a new fact
+- **WHEN** `manage_memory(content="...", action="create")` is called
+- **THEN** a new fact SHALL be created with the provided content
+- **THEN** a unique ID SHALL be auto-generated
+- **THEN** the return value SHALL be `"created memory <id>"`
+
+#### Scenario: Update an existing fact
+- **WHEN** `manage_memory(content="...", action="update", id="<existing-id>")` is called
+- **THEN** the fact SHALL be updated with the new content
+- **THEN** the return value SHALL be `"updated memory <id>"`
+- **WHEN** no `id` is provided for an update action
+- **THEN** a ValueError SHALL be raised
+
+#### Scenario: Delete a fact
+- **WHEN** `manage_memory(action="delete", id="<existing-id>")` is called
+- **THEN** the fact SHALL be deleted
+- **THEN** the return value SHALL be `"Deleted memory <id>"`
+- **WHEN** no `id` is provided for a delete action
+- **THEN** a ValueError SHALL be raised
+
+#### Scenario: Configurable permitted actions
+- **WHEN** creating the tool with `actions_permitted=("create", "update")`
+- **THEN** the delete action SHALL NOT be available
+- **THEN** attempting a delete SHALL raise a ValueError
+
+#### Scenario: Custom instructions
+- **WHEN** creating the tool with custom `instructions`
+- **THEN** those instructions SHALL be included in the tool description to guide LLM usage
+
+### Requirement: search_memory tool
+The system SHALL provide a callable tool for searching stored facts by semantic query.
+
+#### Scenario: Text query search
+- **WHEN** `search_memory(query="preference for dark mode", limit=10)` is called
+- **THEN** the system SHALL perform hybrid search (vector + keyword)
+- **THEN** results SHALL be returned as a serialized JSON list of fact objects
+
+#### Scenario: Filtered search
+- **WHEN** `search_memory(query="...", filter={"category": "preference"})` is called
+- **THEN** results SHALL be filtered to match the specified criteria
+
+#### Scenario: Configurable response format
+- **WHEN** `response_format="content_and_artifact"` is configured
+- **THEN** the tool SHALL return both serialized memories and raw memory objects
+
+### Requirement: Namespace isolation for multi-tenant
+The system SHALL support namespace-based isolation of memory data across users, agents, or organizations.
+
+#### Scenario: Runtime namespace resolution
+- **WHEN** a memory tool is called with a configuration containing `{"user_id": "u-123"}`
+- **THEN** the namespace SHALL be resolved to `("user", "u-123")` at runtime
+- **WHEN** calling with `{"org_id": "acme", "agent_id": "alpha"}`
+- **THEN** the namespace SHALL be `("org", "acme", "alpha")`
+
+#### Scenario: Namespace templating
+- **WHEN** creating memory tools with `namespace=("{user_id}", "memories")`
+- **THEN** the `{user_id}` placeholder SHALL be replaced at runtime from configuration
+- **WHEN** a required config key is missing
+- **THEN** a ConfigurationError SHALL be raised
--- a/openspec/changes/archived/2026-06-07-memory-context-engineering/specs/short-term-context/spec.md
+++ b/openspec/changes/archived/2026-06-07-memory-context-engineering/specs/short-term-context/spec.md
@@ -0,0 +1,65 @@
+## ADDED Requirements
+
+### Requirement: Token budget management
+The system SHALL manage LLM context window limits by tracking token usage and triggering summarization when thresholds are exceeded.
+
+#### Scenario: Token threshold exceeded
+- **WHEN** cumulative message tokens exceed `max_tokens` configuration
+- **THEN** the system SHALL identify messages to summarize starting from oldest
+- **THEN** the system SHALL replace summarized messages with a `RunningSummary` object
+- **THEN** the system SHALL ensure remaining messages + summary fit within `max_tokens` budget
+
+#### Scenario: Partial token budget allocation
+- **WHEN** `max_summary_tokens` is configured (default 256)
+- **THEN** the system SHALL reserve `max_summary_tokens` tokens for the summary itself
+- **THEN** remaining messages SHALL be trimmed to fit within `max_tokens - max_summary_tokens`
+
+### Requirement: Incremental summarization
+The system SHALL support incremental summarization across multiple turns, tracking which messages have already been summarized to avoid redundant work.
+
+#### Scenario: First summarization
+- **WHEN** no existing `RunningSummary` exists and token threshold is exceeded
+- **THEN** the system SHALL call the LLM with an initial summary prompt
+- **THEN** the system SHALL return a `RunningSummary` with `summary`, `summarized_message_ids` set, and `last_summarized_message_id`
+
+#### Scenario: Subsequent summarization (append)
+- **WHEN** a `RunningSummary` exists and new messages exceed threshold
+- **THEN** the system SHALL call the LLM with the existing summary plus new messages
+- **THEN** the system SHALL extend `summarized_message_ids` with newly summarized message IDs
+- **THEN** the system SHALL update `last_summarized_message_id`
+
+### Requirement: Context trimming with summarization hook
+The system SHALL provide a hook that fires before messages are discarded, allowing the daily tier to capture summarized content.
+
+#### Scenario: Pre-trim flush
+- **WHEN** messages are about to be discarded (summarized)
+- **THEN** the system SHALL fire a `memory_flush_hook` with the messages being summarized
+- **THEN** the hook SHALL queue the messages for async memory extraction
+- **THEN** the main thread SHALL NOT block on memory extraction
+
+### Requirement: Token counting with fallback
+The system SHALL provide accurate token counting using `tiktoken` when available, with a char-based fallback.
+
+#### Scenario: tiktoken available
+- **WHEN** tiktoken package is installed
+- **THEN** the system SHALL use `tiktoken.get_encoding("cl100k_base")` for token counting
+- **THEN** token counts SHALL be accurate per OpenAI/Anthropic tokenization
+
+#### Scenario: tiktoken unavailable
+- **WHEN** tiktoken is not installed
+- **THEN** the system SHALL fall back to character-based estimation: `len(text) // 4`
+- **THEN** the system SHALL log a warning about missing tiktoken
+
+### Requirement: Summarization node for LangGraph
+The system SHALL provide a `SummarizationNode` Runnable that integrates into LangGraph state graphs.
+
+#### Scenario: Graph integration
+- **WHEN** `SummarizationNode` is added to a LangGraph workflow
+- **THEN** it SHALL read messages from `input_messages_key` (default "messages")
+- **THEN** it SHALL write updated messages to `output_messages_key` (default "summarized_messages")
+- **THEN** it SHALL store `RunningSummary` in `context.running_summary`
+
+#### Scenario: Equality of input/output keys
+- **WHEN** `input_messages_key` equals `output_messages_key`
+- **THEN** the node SHALL emit a `RemoveMessage(REMOVE_ALL_MESSAGES)` to clear previous state
+- **THEN** the node SHALL write the new message list including the summary
--- a/openspec/changes/archived/2026-06-07-memory-context-engineering/specs/tiered-consolidation/spec.md
+++ b/openspec/changes/archived/2026-06-07-memory-context-engineering/specs/tiered-consolidation/spec.md
@@ -0,0 +1,64 @@
+## ADDED Requirements
+
+### Requirement: Three-tier memory architecture
+The system SHALL maintain three tiers of memory: Context (short-term/ephemeral), Daily (medium-term/file-based), and Core (long-term/distilled).
+
+#### Scenario: Context tier stores active session
+- **WHEN** an agent conversation is in progress
+- **THEN** the context tier SHALL track messages, token usage, and running summary
+- **WHEN** the session ends or context is trimmed
+- **THEN** the context SHALL be flushed to the daily tier
+
+#### Scenario: Daily tier persists as Markdown files
+- **WHEN** context is flushed
+- **THEN** the daily tier SHALL append summarized records to `memory/YYYY-MM-DD.md`
+- **THEN** each session block SHALL have a timestamped header (e.g., `## Trimmed Context (14:30)`)
+- **THEN** daily files SHALL be created lazily (only when first write occurs)
+
+#### Scenario: Core tier stores distilled long-term knowledge
+- **WHEN** Deep Dream consolidation runs
+- **THEN** the core tier SHALL be updated by rewriting `MEMORY.md`
+- **THEN** `MEMORY.md` SHALL be formatted as Markdown with `- ` bullet items, optionally grouped under `## headings`
+
+### Requirement: Daily memory file management
+The system SHALL manage daily memory files with automatic creation and lazy initialization.
+
+#### Scenario: Lazy file creation
+- **WHEN** the first memory write occurs for a given day
+- **THEN** a file SHALL be created at `memory/YYYY-MM-DD.md` with a header `# Daily Memory: YYYY-MM-DD`
+
+#### Scenario: Append-only writes
+- **WHEN** subsequent memory writes occur on the same day
+- **THEN** new entries SHALL be appended to the existing daily file
+
+### Requirement: Deep Dream consolidation
+The system SHALL periodically consolidate daily memories into the core memory using LLM-based distillation.
+
+#### Scenario: Deep Dream triggered
+- **WHEN** `deep_dream(lookback_days=N)` is called
+- **THEN** the system SHALL read current `MEMORY.md` and the last N daily files
+- **THEN** the LLM SHALL receive both the current memory and daily records
+- **THEN** the LLM SHALL return `[MEMORY]` and `[DREAM]` sections
+- **THEN** `MEMORY.md` SHALL be overwritten with the `[MEMORY]` content
+- **THEN** a dream diary SHALL be written to `memory/dreams/YYYY-MM-DD.md`
+
+#### Scenario: Dedup prevents redundant runs
+- **WHEN** Deep Dream is called but daily content hash matches the last processed hash
+- **THEN** the operation SHALL be skipped
+
+#### Scenario: No daily content skips gracefully
+- **WHEN** Deep Dream is called but no recent daily files have content
+- **THEN** the operation SHALL be skipped and existing `MEMORY.md` SHALL be preserved
+
+#### Scenario: No-fabrication constraint
+- **WHEN** the LLM produces the `[MEMORY]` section
+- **THEN** it SHALL ONLY use information present in the source materials (current MEMORY.md + daily files)
+- **THEN** it SHALL NOT fabricate, infer, or add information not present in the source
+
+### Requirement: Context summary injection
+The system SHALL support injecting daily summary text into the active message list for context continuity.
+
+#### Scenario: Context summary callback
+- **WHEN** a daily memory flush completes
+- **THEN** an optional callback SHALL be invoked with the daily summary text
+- **THEN** the caller MAY inject the summary into the message list for continued context awareness