memory

dan/memory

mirror of https://github.com/mruwnik/memory.git synced 2026-01-02 17:22:58 +01:00

Author	SHA1	Message	Date
mruwnik	60e6e18284	Add modality detection and family term expansion for search - Add useModalityDetection config flag to detect content type hints from natural language queries (e.g., "on lesswrong" → forum, "comic about" → comic) - Strip meta-language noise from queries ("there was something about") - Add family term expansion (father ↔ son, parent ↔ child, etc.) - Modality detection is off by default, configurable per-request TODO: Replace regex-based detection with LLM-based query analysis (Haiku) that can run in parallel with HyDE for better accuracy. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-21 13:32:10 +00:00
mruwnik	a10f93cb3c	Add per-request configuration for search enhancements Allow callers to enable/disable BM25, HyDE, reranking, and query expansion on a per-request basis via SearchConfig. When not specified, falls back to global settings from environment variables. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-21 12:38:01 +00:00
mruwnik	d9fcfe3878	more search improvements	2025-12-21 12:29:44 +00:00
mruwnik	f3d8b6602b	Add popularity boosting to search based on karma - Add `popularity` property to SourceItem base class (default 1.0) - Override in ForumPost with karma-based calculation: - Uses KARMA_REFERENCES dict mapping URL patterns to reference values - LessWrong: 100 (90th percentile from actual data) - Reference karma gives popularity=2.0, caps at 2.5 - Add apply_popularity_boost() to search pipeline - POPULARITY_BOOST = 0.02 (2% score adjustment per popularity unit) - Add comprehensive tests for popularity boost 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-20 22:44:06 +00:00
mruwnik	09215adf9a	Add comprehensive tests for search improvements - Add tests for extract_query_terms (stopword filtering, short words) - Add tests for apply_query_term_boost (boost calculations, edge cases) - Add tests for deduplicate_by_source (keeps highest per source) - Add tests for apply_title_boost (title matching with mocked DB) - Add tests for fuse_scores_rrf (RRF score fusion, ranking behavior) - Add tests for rerank module (VoyageAI reranker mocking) Uses pytest.mark.parametrize for concise, data-driven tests. 77 tests total covering all new search functionality. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-20 22:26:16 +00:00
mruwnik	c6cb793cdf	Add HyDE (Hypothetical Document Embeddings) for query expansion HyDE generates a hypothetical document passage that would answer the user's query, then embeds that alongside the original query. This bridges the gap between how users describe what they're looking for and the actual document terminology. Changes: - Add hyde.py with expand_query_hyde() function - Integrate HyDE into search_chunks() pipeline - Add ENABLE_HYDE_EXPANSION and HYDE_TIMEOUT settings - Only expand queries with 4+ words (short queries are specific enough) - Simple in-memory cache to avoid re-generating for repeated queries Example: - Query: "saying what you mean not using specific words" - HyDE generates: "Clear communication requires expressing your thoughts directly and honestly, even when you lack the precise terminology..." - This finds articles about word meaning and clear communication 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-20 16:01:03 +00:00
mruwnik	e414c3311c	Improve RAG search quality with PostgreSQL FTS and hybrid scoring Major changes: - Replace OOM-causing in-memory BM25 with PostgreSQL full-text search - Add tsvector column and GIN index for fast keyword search - Implement hybrid score fusion (70% embedding + 30% FTS + 15% bonus) - Add CANDIDATE_MULTIPLIER (5x) to search more candidates before fusion - Add stopword filtering to FTS queries for less strict matching - Make search limit configurable (default 20, max 100) - Propagate relevance scores through the search pipeline Search improvements: - "clowns iconoclasts" → finds target at rank 1 (score 0.815) - "replacing words with definitions" → finds target at rank 1 - Vague queries now find results with limit=30 that were previously missed 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-20 15:54:30 +00:00
mruwnik	f2161e09f3	Fix 11 high-priority bugs from third deep dive - Add IMAP connection cleanup on logout failure (email.py) - Handle IntegrityError for concurrent email processing (tasks/email.py) - Recover stale scheduled calls stuck in "executing" state (scheduled_calls.py) - Move git operations outside DB transaction in notes sync (notes.py) - Add null checks for recipient_user/from_user in Discord (discord.py) - Add OAuth state and session cleanup tasks (maintenance.py) - Add distributed lock for backup tasks (backup.py) - Add /tmp storage warning in settings (settings.py) - Fix health check error exposure (app.py) - Remove sensitive data from logs (auth.py) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-19 22:15:25 +00:00
mruwnik	b9d6ff8745	Fix 8 security and code quality issues from deep dive Security fixes: - Issue #1: Improved path traversal validation using pathlib.relative_to() - Issue #4: Added timing attack prevention for user enumeration - Issue #5: Added constant-time API key comparison using secrets.compare_digest() Performance fixes: - Issue #20: Cache database engine and session factory for proper connection pooling Code quality fixes: - Issue #28: Fixed string literal without effect (now proper comment) - Issue #29: Removed duplicate db_session.add() call - Issue #30: Fixed incorrect docstring parameter name - Issue #31: Added parentheses for clear operator precedence in set operations 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-19 21:55:59 +00:00
mruwnik	d644281b26	Fix 5 security and quality bugs BUG-030: Add rate limiting via slowapi middleware - Added slowapi to requirements - Configurable limits: 100/min default, 30/min search, 10/min auth - Rate limit settings in settings.py BUG-028: Fix filter validation in embeddings.py - Unknown filter keys now logged and ignored instead of passed through - Prevents potential filter injection BUG-034: Fix timezone handling in oauth_provider.py - Now uses timezone-aware UTC comparison for refresh tokens BUG-050: Fix SQL injection in test database handling - Added validate_db_identifier() function - Validates database names contain only safe characters Also: - Updated tests for bcrypt password format - Updated test for filter validation behavior - Updated INVESTIGATION.md with fix status 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-19 21:41:16 +00:00
mruwnik	9a0f226972	Fix BUG-002: BookSection chunks now use correct 'book' modality Root cause: BookSection._chunk_contents() called extract_text() without specifying modality, which defaults to "text". This caused 9,370 book chunks to be stored in the 'text' collection instead of 'book'. Fix: Added modality="book" to all DataChunk creation in BookSection: - extract_text() call for single-page sections - Direct DataChunk creation for multi-page sections Note: The original investigation reported 1,338 mail items, but current analysis shows those are actually email attachments which correctly go to text/doc/photo collections based on their content type. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-19 21:04:19 +00:00
mruwnik	1c43f1ae62	Fix 7 critical security and code quality bugs (BUG-061 to BUG-068) Security Fixes: - BUG-061: Replace insecure SHA-256 password hashing with bcrypt - BUG-065: Add constant-time comparison for password verification - BUG-062: Remove full OAuth token logging - BUG-064: Remove shell=True from subprocess calls Code Quality: - BUG-063: Update 24+ deprecated SQLAlchemy .get() calls Infrastructure: - BUG-067: Add resource limits to Docker services - BUG-068: Enable Redis persistence (AOF) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2025-12-19 20:22:46 +00:00
Daniel O'Connell	52274f82a6	Fix 19 bugs from investigation Critical/High severity fixes: - BUG-001: Path traversal vulnerabilities (3 endpoints) - BUG-003: BM25 filters now apply size/observation_types - BUG-006: Remove API key from log messages - BUG-008: Chunk size validation before yielding - BUG-009: Race condition fix with FOR UPDATE SKIP LOCKED - BUG-010: Add mcp_servers property to MessageProcessor - BUG-011: Fix user_id type (BigInteger→Integer) - BUG-012: Swap inverted score thresholds - BUG-013: Add retry logic to embedding pipeline - BUG-014: Fix CORS to use specific origin - BUG-015: Add Celery retry/timeout defaults - BUG-016: Re-raise exceptions for Celery retries Medium severity fixes: - BUG-017: Add collection_name index on Chunk - BUG-031: Add SearchConfig limits (max 1000/300s) - BUG-033: Replace debug prints with logger calls - BUG-037: Clarify timezone handling in scheduler - BUG-043: Health check now validates DB + Qdrant - BUG-055: collection_model returns None not "unknown" 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-19 19:07:10 +01:00
Daniel O'Connell	116d0362a2	Fix REGISTER_ENABLED always evaluating to True (BUG-005) Logic error: `boolean_env(...) or True` always evaluates to True, making the environment variable useless. Fixed by removing `or True`. Note: This setting is currently unused in the codebase but the fix ensures correct behavior when it's used. Also updates DISCORD_MODEL default to claude-haiku-4-5 for faster and cheaper responses. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-19 18:33:05 +01:00
Daniel O'Connell	28bc10df92	Fix break_chunk() appending wrong object (BUG-007) The function was appending the entire DataChunk object instead of the individual item when processing non-string data (e.g., images). Bug: `result.append(chunk)` should have been `result.append(c)` This caused: - Type mismatches (returning DataChunk instead of MulitmodalChunk) - Potential circular references - Embedding failures for mixed content Fixed by appending the individual item `c` instead of the parent `chunk`. Updated existing test and added new test to verify behavior. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-19 18:27:13 +01:00
Daniel O'Connell	21dedbeb61	Fix search score aggregation to use mean instead of sum BUG-004: Score aggregation was broken - documents with more chunks would always rank higher regardless of relevance because scores were summed instead of averaged. Changes: - Changed score calculation from sum() to mean() - Added comprehensive tests for SearchResult.from_source_item() - Added tests for elide_content helper This ensures search results are ranked by actual relevance rather than by the number of chunks in the document. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-19 18:25:15 +01:00
mruwnik	56ed7b7d8f	fix scheduler	2025-11-04 12:46:38 +00:00
Daniel O'Connell	ad6510bd17	add a bunch of tests	2025-11-03 23:23:41 +01:00
mruwnik	56c0df9761	tweaks	2025-11-03 19:42:13 +00:00
mruwnik	b568222e88	list discord command	2025-11-03 19:19:55 +00:00
mruwnik	8893018af1	multiple mcp servers	2025-11-03 16:41:26 +00:00
mruwnik	2d3dc06fdf	tool to set up discord bot	2025-11-03 11:11:19 +00:00
mruwnik	2944a0bce1	properly handle mcp redirects	2025-11-03 00:00:02 +00:00
Daniel O'Connell	0d9f8beec3	handle mcp servers in discord	2025-11-02 23:49:50 +01:00
Daniel O'Connell	64bb926eba	mcp servers for discord bots	2025-11-02 23:49:44 +01:00
Daniel O'Connell	6250586d1f	prompt from bot user	2025-11-02 23:49:35 +01:00
mruwnik	9182f15c45	properly handle bot prompts	2025-11-02 15:51:30 +00:00
Daniel O'Connell	afdff1708b	prompt from bot user	2025-11-02 16:46:26 +01:00
Daniel O'Connell	64e84b1c89	basic tools	2025-11-02 16:34:38 +01:00
mruwnik	798b4779da	unify discord callers	2025-11-02 14:46:43 +00:00
mruwnik	69192f834a	handle discord threads	2025-11-02 11:23:31 +00:00
Daniel O'Connell	6bd7df8ee3	properly handle images by anthropic	2025-11-02 12:08:46 +01:00
mruwnik	a4f42e656a	save images	2025-11-02 10:25:23 +00:00
mruwnik	e95a082147	allow discord tools	2025-11-02 00:50:12 +00:00
mruwnik	a5bc53326d	backups	2025-11-02 00:01:35 +00:00
mruwnik	131427255a	fix typing indicator	2025-11-01 20:27:57 +00:00
Daniel O'Connell	ff3ca4f109	show typing	2025-11-01 21:13:39 +01:00
mruwnik	3b216953ab	better docker compise	2025-11-01 19:51:41 +00:00
Daniel O'Connell	d7e403fb83	optional chattiness	2025-11-01 20:39:15 +01:00
mruwnik	57145ac7b4	fix bugs	2025-11-01 19:35:20 +00:00
Daniel O'Connell	814090dccb	use db bots	2025-11-01 18:52:37 +01:00
Daniel O'Connell	9639fa3dd7	use usage tracker	2025-11-01 18:49:06 +01:00
Daniel O'Connell	8af07f0dac	add slash commands for discord	2025-11-01 18:04:38 +01:00
Daniel O'Connell	c296f3b533	extract usage	2025-11-01 17:56:20 +01:00
Daniel O'Connell	07852f9ee7	Base usage tracker	2025-11-01 16:22:40 +01:00
Daniel O'Connell	bcb470db9b	use redis for celery backend	2025-11-01 15:55:59 +01:00
EC2 Default User	4fedd8fe04	fix admin	2025-10-20 22:09:06 +00:00
Daniel O'Connell	aaa0c2c3cd	better discord integration	2025-10-20 23:08:34 +02:00
Daniel O'Connell	1a3cf9c931	add tetsts	2025-10-20 21:10:39 +02:00
Daniel O'Connell	1606348d8b	discord integration	2025-10-20 03:47:13 +02:00

1 2 3

137 Commits