argenis de la rosa
|
ce4f36a3ab
|
test: 130 edge case tests + fix NaN/Infinity bug in cosine_similarity
Edge cases found 2 real bugs:
- cosine_similarity(NaN, ...) returned NaN instead of 0.0
- cosine_similarity(Infinity, ...) returned NaN instead of 0.0
Fix: added is_finite() guards on denom and raw ratio.
New edge case tests by module:
- vector.rs (18): NaN, Infinity, negative vectors, opposite vectors clamped,
high-dimensional (1536), single element, both-zero, non-aligned bytes,
3-byte input, special float values, NaN roundtrip, limit=0, zero weights,
negative BM25 scores, duplicate IDs, large normalization, single item
- embeddings.rs (8): noop embed_one error, empty batch, multiple texts,
empty/unknown provider, custom empty URL, no API key, trailing slash, dims
- chunker.rs (11): headings-only, deeply nested ####, long single line,
whitespace-only, max_tokens=0, max_tokens=1, unicode/emoji, FTS5 special
chars, multiple blank lines, trailing heading, no content loss
- sqlite.rs (23): FTS5 quotes/asterisks/parens, SQL injection, empty
content/key, 100KB content, unicode+emoji, newlines+tabs, single char
query, limit=0/1, key matching, unicode query, schema idempotency,
triple open, ghost results after forget, forget+re-store cycle,
reindex empty/twice, content_hash empty/unicode/long, category
roundtrip with spaces/empty, list custom category, list empty DB
869 tests passing, 0 clippy warnings, cargo-deny clean
|
2026-02-14 00:28:55 -05:00 |
|
argenis de la rosa
|
0e7f501fd6
|
feat: full-stack search engine — FTS5, vector search, hybrid merge, embedding cache, chunker
The Full Stack (All Custom):
- Vector DB: embeddings stored as BLOB, cosine similarity in pure Rust
- Keyword Search: FTS5 virtual tables with BM25 scoring + auto-sync triggers
- Hybrid Merge: weighted fusion of vector + keyword results (configurable weights)
- Embeddings: provider abstraction (OpenAI, custom URL, noop fallback)
- Chunking: line-based markdown chunker with heading preservation
- Caching: embedding_cache table with LRU eviction
- Safe Reindex: rebuild FTS5 + re-embed missing vectors
New modules:
- src/memory/embeddings.rs — EmbeddingProvider trait + OpenAI + Noop + factory
- src/memory/vector.rs — cosine similarity, vec↔bytes, ScoredResult, hybrid_merge
- src/memory/chunker.rs — markdown-aware document splitting
Upgraded:
- src/memory/sqlite.rs — FTS5 schema, embedding column, hybrid recall, cache, reindex
- src/config/schema.rs — MemoryConfig expanded with embedding/search settings
- All callers updated to pass api_key for embedding provider
739 tests passing, 0 clippy warnings (Rust 1.93.1), cargo-deny clean
|
2026-02-14 00:00:23 -05:00 |
|