memory

dan/memory

mirror of https://github.com/mruwnik/memory.git synced 2026-01-03 01:32:57 +01:00

Author	SHA1	Message	Date
mruwnik	5b997cc397	Fix search bugs: query terms, index validation, chunk loss - Include 2-letter terms (AI, ML) in query term extraction (was > 2, now >= 2) - Add guard for empty data before accessing data[0].data[0] in scorer - Preserve chunks without content in reranking instead of silently dropping - Remove legacy wrapper functions (apply_title_boost, apply_popularity_boost) - Update tests to use apply_source_boosts directly 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-21 15:01:03 +00:00
mruwnik	782b56939f	Refactor search: add LLM query analysis, extract constants - Add query_analysis.py for LLM-based query preprocessing - Detects modalities from natural language ("on lesswrong" -> forum) - Cleans meta-language ("I remember reading..." -> core query) - Generates query variants for better recall - Dynamically discovers modalities and domains from database - Extract constants to constants.py - STOPWORDS, RRF_K, boost values, etc. - Cleaner separation of configuration from logic - Refactor search_chunks into focused helper functions - _run_llm_analysis: parallel query analysis + HyDE - _apply_query_analysis: apply analysis results - _build_search_data: construct search data with variants - _run_searches: embedding + BM25 with RRF fusion - _fetch_chunks: database retrieval with scoring - _apply_boosts: title, popularity, recency boosts - _apply_reranking: cross-encoder reranking - Remove redundant regex-based modality detection - Remove static QUERY_EXPANSIONS (LLM handles this better) - Add comprehensive tests for query_analysis module 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-21 14:43:17 +00:00
mruwnik	d9fcfe3878	more search improvements	2025-12-21 12:29:44 +00:00
mruwnik	f3d8b6602b	Add popularity boosting to search based on karma - Add `popularity` property to SourceItem base class (default 1.0) - Override in ForumPost with karma-based calculation: - Uses KARMA_REFERENCES dict mapping URL patterns to reference values - LessWrong: 100 (90th percentile from actual data) - Reference karma gives popularity=2.0, caps at 2.5 - Add apply_popularity_boost() to search pipeline - POPULARITY_BOOST = 0.02 (2% score adjustment per popularity unit) - Add comprehensive tests for popularity boost 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-20 22:44:06 +00:00
mruwnik	09215adf9a	Add comprehensive tests for search improvements - Add tests for extract_query_terms (stopword filtering, short words) - Add tests for apply_query_term_boost (boost calculations, edge cases) - Add tests for deduplicate_by_source (keeps highest per source) - Add tests for apply_title_boost (title matching with mocked DB) - Add tests for fuse_scores_rrf (RRF score fusion, ranking behavior) - Add tests for rerank module (VoyageAI reranker mocking) Uses pytest.mark.parametrize for concise, data-driven tests. 77 tests total covering all new search functionality. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-20 22:26:16 +00:00

5 Commits