memory

dan/memory

mirror of https://github.com/mruwnik/memory.git synced 2026-01-02 17:22:58 +01:00

Author	SHA1	Message	Date
mruwnik	f042f9aed8	proactive stuff	2025-12-29 14:07:12 +00:00
mruwnik	47180e1e71	fixes	2025-12-24 14:52:12 +00:00
mruwnik	5d79fa349e	synch people	2025-12-24 14:38:14 +00:00
mruwnik	47629fc5fb	add PRs and People	2025-12-24 13:25:34 +00:00
mruwnik	526bfa5f6b	more github ingesting	2025-12-23 20:02:10 +00:00
mruwnik	5b997cc397	Fix search bugs: query terms, index validation, chunk loss - Include 2-letter terms (AI, ML) in query term extraction (was > 2, now >= 2) - Add guard for empty data before accessing data[0].data[0] in scorer - Preserve chunks without content in reranking instead of silently dropping - Remove legacy wrapper functions (apply_title_boost, apply_popularity_boost) - Update tests to use apply_source_boosts directly 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-21 15:01:03 +00:00
mruwnik	782b56939f	Refactor search: add LLM query analysis, extract constants - Add query_analysis.py for LLM-based query preprocessing - Detects modalities from natural language ("on lesswrong" -> forum) - Cleans meta-language ("I remember reading..." -> core query) - Generates query variants for better recall - Dynamically discovers modalities and domains from database - Extract constants to constants.py - STOPWORDS, RRF_K, boost values, etc. - Cleaner separation of configuration from logic - Refactor search_chunks into focused helper functions - _run_llm_analysis: parallel query analysis + HyDE - _apply_query_analysis: apply analysis results - _build_search_data: construct search data with variants - _run_searches: embedding + BM25 with RRF fusion - _fetch_chunks: database retrieval with scoring - _apply_boosts: title, popularity, recency boosts - _apply_reranking: cross-encoder reranking - Remove redundant regex-based modality detection - Remove static QUERY_EXPANSIONS (LLM handles this better) - Add comprehensive tests for query_analysis module 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-21 14:43:17 +00:00
mruwnik	d9fcfe3878	more search improvements	2025-12-21 12:29:44 +00:00
mruwnik	f3d8b6602b	Add popularity boosting to search based on karma - Add `popularity` property to SourceItem base class (default 1.0) - Override in ForumPost with karma-based calculation: - Uses KARMA_REFERENCES dict mapping URL patterns to reference values - LessWrong: 100 (90th percentile from actual data) - Reference karma gives popularity=2.0, caps at 2.5 - Add apply_popularity_boost() to search pipeline - POPULARITY_BOOST = 0.02 (2% score adjustment per popularity unit) - Add comprehensive tests for popularity boost 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-20 22:44:06 +00:00
mruwnik	09215adf9a	Add comprehensive tests for search improvements - Add tests for extract_query_terms (stopword filtering, short words) - Add tests for apply_query_term_boost (boost calculations, edge cases) - Add tests for deduplicate_by_source (keeps highest per source) - Add tests for apply_title_boost (title matching with mocked DB) - Add tests for fuse_scores_rrf (RRF score fusion, ranking behavior) - Add tests for rerank module (VoyageAI reranker mocking) Uses pytest.mark.parametrize for concise, data-driven tests. 77 tests total covering all new search functionality. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-20 22:26:16 +00:00
mruwnik	d644281b26	Fix 5 security and quality bugs BUG-030: Add rate limiting via slowapi middleware - Added slowapi to requirements - Configurable limits: 100/min default, 30/min search, 10/min auth - Rate limit settings in settings.py BUG-028: Fix filter validation in embeddings.py - Unknown filter keys now logged and ignored instead of passed through - Prevents potential filter injection BUG-034: Fix timezone handling in oauth_provider.py - Now uses timezone-aware UTC comparison for refresh tokens BUG-050: Fix SQL injection in test database handling - Added validate_db_identifier() function - Validates database names contain only safe characters Also: - Updated tests for bcrypt password format - Updated test for filter validation behavior - Updated INVESTIGATION.md with fix status 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-19 21:41:16 +00:00
Daniel O'Connell	28bc10df92	Fix break_chunk() appending wrong object (BUG-007) The function was appending the entire DataChunk object instead of the individual item when processing non-string data (e.g., images). Bug: `result.append(chunk)` should have been `result.append(c)` This caused: - Type mismatches (returning DataChunk instead of MulitmodalChunk) - Potential circular references - Embedding failures for mixed content Fixed by appending the individual item `c` instead of the parent `chunk`. Updated existing test and added new test to verify behavior. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-19 18:27:13 +01:00
Daniel O'Connell	21dedbeb61	Fix search score aggregation to use mean instead of sum BUG-004: Score aggregation was broken - documents with more chunks would always rank higher regardless of relevance because scores were summed instead of averaged. Changes: - Changed score calculation from sum() to mean() - Added comprehensive tests for SearchResult.from_source_item() - Added tests for elide_content helper This ensures search results are ranked by actual relevance rather than by the number of chunks in the document. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-19 18:25:15 +01:00
Daniel O'Connell	93b77a16d6	Add pytest markers for fast/slow test separation - Add --run-slow flag to optionally include slow tests - Auto-detect tests that use db_session, test_db, db_engine, or qdrant fixtures - Skip slow tests by default for faster development iteration - Usage: pytest (fast only) or pytest --run-slow (all tests) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-19 18:21:41 +01:00
mruwnik	56ed7b7d8f	fix scheduler	2025-11-04 12:46:38 +00:00
Daniel O'Connell	ad6510bd17	add a bunch of tests	2025-11-03 23:23:41 +01:00
mruwnik	a5bc53326d	backups	2025-11-02 00:01:35 +00:00
Daniel O'Connell	814090dccb	use db bots	2025-11-01 18:52:37 +01:00
Daniel O'Connell	9639fa3dd7	use usage tracker	2025-11-01 18:49:06 +01:00
Daniel O'Connell	8af07f0dac	add slash commands for discord	2025-11-01 18:04:38 +01:00
Daniel O'Connell	07852f9ee7	Base usage tracker	2025-11-01 16:22:40 +01:00
Daniel O'Connell	1a3cf9c931	add tetsts	2025-10-20 21:10:39 +02:00
Daniel O'Connell	1606348d8b	discord integration	2025-10-20 03:47:13 +02:00
Daniel O'Connell	99d3843f47	move to general LLM providers	2025-10-13 03:23:20 +02:00
Daniel O'Connell	f454aa9afa	change schedule call signature	2025-10-12 10:17:22 +02:00
Daniel O'Connell	a3544222e7	add scheduled calls	2025-08-12 23:37:54 +00:00
Daniel O'Connell	b68e15d3ab	add blogs	2025-08-09 02:07:49 +02:00
Daniel O'Connell	beb94375da	fix tests	2025-07-24 23:34:10 +02:00
Daniel O'Connell	50601ad930	proper notes path	2025-07-06 13:53:29 +02:00
Daniel O'Connell	288c2995e5	synch notes	2025-07-05 23:58:47 +02:00
Daniel O'Connell	8eb6374cac	second pass in search	2025-06-28 20:59:15 +02:00
Daniel O'Connell	01ccea2733	add missing tests	2025-06-28 02:30:54 +02:00
Daniel O'Connell	a3daea883b	fix tests	2025-06-26 14:12:42 +02:00
Daniel O'Connell	0e574542d5	fix tests	2025-06-10 15:32:34 +02:00
Daniel O'Connell	3e4e5872d1	search filters	2025-06-10 12:16:54 +02:00
Daniel O'Connell	780e27ba04	better emails embedding + format search results	2025-06-09 13:51:58 +02:00
Daniel O'Connell	4d057d1ec6	discord notification on error	2025-06-05 02:21:52 +02:00
Daniel O'Connell	e5da3714de	muliple dimemnsions for confidence values	2025-06-03 12:18:20 +02:00
Daniel O'Connell	a40e0b50fa	editable notes	2025-06-02 22:24:19 +02:00
Daniel O'Connell	ac3b48a04c	notes and observations triggered as jobs	2025-06-02 14:34:39 +02:00
Daniel O'Connell	29b8ce6860	Fix search + proper integration tests	2025-06-02 02:53:32 +02:00
Daniel O'Connell	1dd93929c1	Add embedding for observations	2025-05-31 16:51:55 +02:00
Daniel O'Connell	004bd39987	Add observations model	2025-05-31 16:15:30 +02:00
Daniel O'Connell	e505f9b53c	summarize before chunking	2025-05-29 01:26:10 +02:00
Daniel O'Connell	ed8033bdd3	Add less wrong tasks + reindexer	2025-05-28 03:14:27 +02:00
Daniel O'Connell	ab87bced81	fix linting	2025-05-27 23:19:28 +02:00
Daniel O'Connell	1291ca9d08	better handling of errors	2025-05-27 22:39:24 +02:00
Daniel O'Connell	f5c3e458d7	move parsers	2025-05-27 21:53:31 +02:00
Daniel O'Connell	0f15e4e410	Check all feeds work	2025-05-27 01:42:22 +02:00
Daniel O'Connell	876fa87725	Add archives fetcher	2025-05-27 01:24:57 +02:00

1 2

78 Commits