Drop 9 batch proposals that are superseded by the boocode-lift-analysis (boocontext-audit, conductor upgrades, self-healing/verify-gate skills): add-3tier-memory, import-llm-evaluator, import-pregel-engine, plugin-platform, conductor-evolution, code-intelligence-upgrade, dev-workflow, ui-overhaul, agent-reliability. Delete 11 stub archive files (49-66B each, 'Status: Shipped. Archived.' only) that provide zero documentation value over the existing CHANGELOG.md + git tags.
3.5 KiB
3.5 KiB
ADDED Requirements
Requirement: Hybrid search with vector + keyword fusion
The system SHALL combine vector similarity search and keyword search into unified ranked results.
Scenario: Vector search runs when embedding provider available
- WHEN an embedding provider is configured
- THEN the system SHALL compute a query embedding and perform cosine similarity search
- WHEN no embedding provider is configured
- THEN the system SHALL gracefully degrade to keyword-only search
Scenario: Keyword search always runs
- WHEN a search query is submitted
- THEN the system SHALL always perform keyword search regardless of embedding provider availability
Scenario: Weighted score merging
- WHEN both vector and keyword results are available
- THEN the final score SHALL be:
vector_weight * vector_score + keyword_weight * keyword_score - THEN default weights SHALL be
vector_weight=0.7,keyword_weight=0.3 - THEN weights SHALL be configurable
Requirement: Vector search via numpy cosine similarity
The system SHALL perform vector search using numpy-vectorized cosine similarity for performance.
Scenario: Vectorized cosine similarity
- WHEN numpy is available
- THEN all chunk embeddings SHALL be loaded into a numpy matrix
(N, D) - THEN cosine similarity SHALL be computed as
matrix @ query_vector(BLAS matrix-vector multiply) - THEN top-K results SHALL be selected via
argpartition(O(N) average)
Scenario: Pure-Python fallback
- WHEN numpy is unavailable
- THEN cosine similarity SHALL be computed per-row with pure Python
- THEN results SHALL be sorted and the top K returned
Requirement: Three-tier keyword search (FTS5 → trigram → LIKE)
The system SHALL provide a cascading keyword search strategy for multi-language support.
Scenario: Standard FTS5 for ASCII queries
- WHEN the query contains only ASCII characters
- THEN the system SHALL use SQLite FTS5 with the unicode61 tokenizer
- THEN BM25 ranking SHALL be converted to a
[0, 1)score
Scenario: Trigram FTS5 for CJK queries
- WHEN the query contains CJK (Chinese, Japanese, Korean) characters
- THEN the system SHALL use SQLite FTS5 with the trigram tokenizer
- THEN CJK character sequences and ASCII words SHALL be extracted and joined with AND
Scenario: LIKE fallback for edge cases
- WHEN FTS5 is unavailable or returns empty results
- THEN the system SHALL fall back to LIKE-based search
- THEN CJK runs (1+ chars) and ASCII words (3+ chars) SHALL be matched independently
Requirement: Temporal decay for dated memory files
The system SHALL apply exponential decay to search scores for dated memory files.
Scenario: Decay applied to dated files
- WHEN a memory chunk path matches
YYYY-MM-DD.md - THEN the combined score SHALL be multiplied by
exp(-ln(2)/half_life * age_days) - THEN the default
half_lifeSHALL be 30 days - WHEN the path does not contain a date (e.g.,
MEMORY.md) - THEN no decay SHALL be applied (multiplier = 1.0)
Requirement: Result filtering and limits
The system SHALL filter search results by minimum score and maximum count.
Scenario: Min score threshold
- WHEN search results are merged
- THEN results with score below
min_score(default 0.1) SHALL be discarded
Scenario: Max results limit
- WHEN search results exceed
max_results - THEN only the top
max_resultsby combined score SHALL be returned