boocode/openspec/changes/archived/2026-06-07-memory-context-engineering/specs/hybrid-search/spec.md at c4ee377dbc2edd411ecc03779dae3aa49b14e7b6

Files

indifferentketchup c935687725 chore(openspec): drop 9 superseded proposals + 11 stub archive files

Drop 9 batch proposals that are superseded by the boocode-lift-analysis
(boocontext-audit, conductor upgrades, self-healing/verify-gate skills):
add-3tier-memory, import-llm-evaluator, import-pregel-engine, plugin-platform,
conductor-evolution, code-intelligence-upgrade, dev-workflow, ui-overhaul,
agent-reliability.

Delete 11 stub archive files (49-66B each, 'Status: Shipped. Archived.' only)
that provide zero documentation value over the existing CHANGELOG.md + git tags.

2026-06-07 22:15:38 +00:00

3.5 KiB

Raw Blame History

ADDED Requirements

Requirement: Hybrid search with vector + keyword fusion

The system SHALL combine vector similarity search and keyword search into unified ranked results.

Scenario: Vector search runs when embedding provider available

WHEN an embedding provider is configured
THEN the system SHALL compute a query embedding and perform cosine similarity search
WHEN no embedding provider is configured
THEN the system SHALL gracefully degrade to keyword-only search

Scenario: Keyword search always runs

WHEN a search query is submitted
THEN the system SHALL always perform keyword search regardless of embedding provider availability

Scenario: Weighted score merging

WHEN both vector and keyword results are available
THEN the final score SHALL be: vector_weight * vector_score + keyword_weight * keyword_score
THEN default weights SHALL be vector_weight=0.7, keyword_weight=0.3
THEN weights SHALL be configurable

Requirement: Vector search via numpy cosine similarity

The system SHALL perform vector search using numpy-vectorized cosine similarity for performance.

Scenario: Vectorized cosine similarity

WHEN numpy is available
THEN all chunk embeddings SHALL be loaded into a numpy matrix (N, D)
THEN cosine similarity SHALL be computed as matrix @ query_vector (BLAS matrix-vector multiply)
THEN top-K results SHALL be selected via argpartition (O(N) average)

Scenario: Pure-Python fallback

WHEN numpy is unavailable
THEN cosine similarity SHALL be computed per-row with pure Python
THEN results SHALL be sorted and the top K returned

Requirement: Three-tier keyword search (FTS5 → trigram → LIKE)

The system SHALL provide a cascading keyword search strategy for multi-language support.

Scenario: Standard FTS5 for ASCII queries

WHEN the query contains only ASCII characters
THEN the system SHALL use SQLite FTS5 with the unicode61 tokenizer
THEN BM25 ranking SHALL be converted to a [0, 1) score

Scenario: Trigram FTS5 for CJK queries

WHEN the query contains CJK (Chinese, Japanese, Korean) characters
THEN the system SHALL use SQLite FTS5 with the trigram tokenizer
THEN CJK character sequences and ASCII words SHALL be extracted and joined with AND

Scenario: LIKE fallback for edge cases

WHEN FTS5 is unavailable or returns empty results
THEN the system SHALL fall back to LIKE-based search
THEN CJK runs (1+ chars) and ASCII words (3+ chars) SHALL be matched independently

Requirement: Temporal decay for dated memory files

The system SHALL apply exponential decay to search scores for dated memory files.

Scenario: Decay applied to dated files

WHEN a memory chunk path matches YYYY-MM-DD.md
THEN the combined score SHALL be multiplied by exp(-ln(2)/half_life * age_days)
THEN the default half_life SHALL be 30 days
WHEN the path does not contain a date (e.g., MEMORY.md)
THEN no decay SHALL be applied (multiplier = 1.0)

Requirement: Result filtering and limits

The system SHALL filter search results by minimum score and maximum count.

Scenario: Min score threshold

WHEN search results are merged
THEN results with score below min_score (default 0.1) SHALL be discarded

Scenario: Max results limit

WHEN search results exceed max_results
THEN only the top max_results by combined score SHALL be returned

3.5 KiB Raw Blame History