## ADDED Requirements ### Requirement: Hybrid search with vector + keyword fusion The system SHALL combine vector similarity search and keyword search into unified ranked results. #### Scenario: Vector search runs when embedding provider available - **WHEN** an embedding provider is configured - **THEN** the system SHALL compute a query embedding and perform cosine similarity search - **WHEN** no embedding provider is configured - **THEN** the system SHALL gracefully degrade to keyword-only search #### Scenario: Keyword search always runs - **WHEN** a search query is submitted - **THEN** the system SHALL always perform keyword search regardless of embedding provider availability #### Scenario: Weighted score merging - **WHEN** both vector and keyword results are available - **THEN** the final score SHALL be: `vector_weight * vector_score + keyword_weight * keyword_score` - **THEN** default weights SHALL be `vector_weight=0.7`, `keyword_weight=0.3` - **THEN** weights SHALL be configurable ### Requirement: Vector search via numpy cosine similarity The system SHALL perform vector search using numpy-vectorized cosine similarity for performance. #### Scenario: Vectorized cosine similarity - **WHEN** numpy is available - **THEN** all chunk embeddings SHALL be loaded into a numpy matrix `(N, D)` - **THEN** cosine similarity SHALL be computed as `matrix @ query_vector` (BLAS matrix-vector multiply) - **THEN** top-K results SHALL be selected via `argpartition` (O(N) average) #### Scenario: Pure-Python fallback - **WHEN** numpy is unavailable - **THEN** cosine similarity SHALL be computed per-row with pure Python - **THEN** results SHALL be sorted and the top K returned ### Requirement: Three-tier keyword search (FTS5 → trigram → LIKE) The system SHALL provide a cascading keyword search strategy for multi-language support. #### Scenario: Standard FTS5 for ASCII queries - **WHEN** the query contains only ASCII characters - **THEN** the system SHALL use SQLite FTS5 with the unicode61 tokenizer - **THEN** BM25 ranking SHALL be converted to a `[0, 1)` score #### Scenario: Trigram FTS5 for CJK queries - **WHEN** the query contains CJK (Chinese, Japanese, Korean) characters - **THEN** the system SHALL use SQLite FTS5 with the trigram tokenizer - **THEN** CJK character sequences and ASCII words SHALL be extracted and joined with AND #### Scenario: LIKE fallback for edge cases - **WHEN** FTS5 is unavailable or returns empty results - **THEN** the system SHALL fall back to LIKE-based search - **THEN** CJK runs (1+ chars) and ASCII words (3+ chars) SHALL be matched independently ### Requirement: Temporal decay for dated memory files The system SHALL apply exponential decay to search scores for dated memory files. #### Scenario: Decay applied to dated files - **WHEN** a memory chunk path matches `YYYY-MM-DD.md` - **THEN** the combined score SHALL be multiplied by `exp(-ln(2)/half_life * age_days)` - **THEN** the default `half_life` SHALL be 30 days - **WHEN** the path does not contain a date (e.g., `MEMORY.md`) - **THEN** no decay SHALL be applied (multiplier = 1.0) ### Requirement: Result filtering and limits The system SHALL filter search results by minimum score and maximum count. #### Scenario: Min score threshold - **WHEN** search results are merged - **THEN** results with score below `min_score` (default 0.1) SHALL be discarded #### Scenario: Max results limit - **WHEN** search results exceed `max_results` - **THEN** only the top `max_results` by combined score SHALL be returned