151 Commits

Author SHA1 Message Date
d644281b26 Fix 5 security and quality bugs
BUG-030: Add rate limiting via slowapi middleware
- Added slowapi to requirements
- Configurable limits: 100/min default, 30/min search, 10/min auth
- Rate limit settings in settings.py

BUG-028: Fix filter validation in embeddings.py
- Unknown filter keys now logged and ignored instead of passed through
- Prevents potential filter injection

BUG-034: Fix timezone handling in oauth_provider.py
- Now uses timezone-aware UTC comparison for refresh tokens

BUG-050: Fix SQL injection in test database handling
- Added validate_db_identifier() function
- Validates database names contain only safe characters

Also:
- Updated tests for bcrypt password format
- Updated test for filter validation behavior
- Updated INVESTIGATION.md with fix status

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-19 21:41:16 +00:00
d2164a49eb Add TODO note for re-indexing existing book chunks
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-19 21:05:29 +00:00
9a0f226972 Fix BUG-002: BookSection chunks now use correct 'book' modality
Root cause: BookSection._chunk_contents() called extract_text() without
specifying modality, which defaults to "text". This caused 9,370 book
chunks to be stored in the 'text' collection instead of 'book'.

Fix: Added modality="book" to all DataChunk creation in BookSection:
- extract_text() call for single-page sections
- Direct DataChunk creation for multi-page sections

Note: The original investigation reported 1,338 mail items, but current
analysis shows those are actually email attachments which correctly go
to text/doc/photo collections based on their content type.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-19 21:04:19 +00:00
f0195464c8 Complete bug investigation and fix unused appuser in Dockerfile
Investigation complete - verified 35+ bugs as fixed or non-issues:

Medium severity verified:
- BUG-018: N/A - intentional TODO comments for future features
- BUG-022: Low priority - sync_book properly chunks, only extract_ebook affected
- BUG-025: Acceptable - 4 chars/token is common approximation
- BUG-029: N/A - intentional score thresholds documented
- BUG-036: Acceptable - IntegrityError handling correct
- BUG-038: N/A - standard single beat process practice

Low severity fixed:
- BUG-056: Removed unused appuser from Dockerfile

Remaining valid issues documented for future work:
- BUG-002: Collection mismatch (needs data verification)
- BUG-026: BM25 scores discarded
- BUG-030: Rate limiting
- BUG-032: CSRF protection

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-19 20:53:53 +00:00
1932931221 Additional bug verification in INVESTIGATION.md
More bugs verified:
- BUG-021: Chunk validation exists via yield_spans 
- BUG-027: N/A - defaulting to 0.0 is reasonable fallback
- BUG-055: collection_model now returns None instead of "unknown" 

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-19 20:47:43 +00:00
74e26e1892 Continue bug verification in INVESTIGATION.md
Additional bugs verified as fixed:
- BUG-017: collection_name index exists 
- BUG-020: server_id index exists 
- BUG-039: Email folder errors handled gracefully 
- BUG-041: N/A - reasonable behavior (backup disabled without key)

Updated bug count to 30+ confirmed fixed

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-19 20:45:43 +00:00
c2be97aad5 Verify and document fixed bugs in INVESTIGATION.md
Updated bug statuses after verification:
- BUG-014: CORS now uses settings.SERVER_URL 
- BUG-015: Celery has global retry config 
- BUG-016: safe_task_execution re-raises exceptions 
- BUG-019: embed_status properly set to STORED 
- BUG-031: SearchConfig enforces limits 
- BUG-033: No print statements in src/memory 
- BUG-035: Task time limits configured 
- BUG-037: Timezone handling fixed 
- BUG-040: Resource limits added (via BUG-067) 
- BUG-043: Health check validates dependencies 
- BUG-054: OAuthToken is intentional mixin design
- BUG-060: Print changed to logger.debug 

Updated executive summary with fix count (25+)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-19 20:40:55 +00:00
0da4f03656 Mark BUG-003, 008, 009, 013 as already fixed
Verified in code review:
- BUG-003: BM25 applies all SearchFilters
- BUG-008: yield_spans() guarantees token limits
- BUG-009: Uses FOR UPDATE SKIP LOCKED
- BUG-013: Has retry logic with exponential backoff

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-19 20:28:34 +00:00
2e3371ec4e Update INVESTIGATION.md with verified bug fixes
- Mark BUG-010 (MCP servers) as already fixed
- Mark BUG-011 (User ID type) as already fixed
- Document BUG-061 to BUG-068 fixes from commit 1c43f1a

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-19 20:25:09 +00:00
1c43f1ae62 Fix 7 critical security and code quality bugs (BUG-061 to BUG-068)
Security Fixes:
- BUG-061: Replace insecure SHA-256 password hashing with bcrypt
- BUG-065: Add constant-time comparison for password verification
- BUG-062: Remove full OAuth token logging
- BUG-064: Remove shell=True from subprocess calls

Code Quality:
- BUG-063: Update 24+ deprecated SQLAlchemy .get() calls

Infrastructure:
- BUG-067: Add resource limits to Docker services
- BUG-068: Enable Redis persistence (AOF)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-19 20:22:46 +00:00
Daniel O'Connell
adff8662bb Add investigation findings documentation
Documents 100+ issues found across:
- Data layer (10 issues)
- Content processing (12 issues)
- Search system (14 issues)
- API layer (12 issues)
- Worker tasks (20 issues)
- Infrastructure (12 issues)
- Code quality (20+ issues)

Includes database statistics and improvement suggestions.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-19 19:24:17 +01:00
Daniel O'Connell
52274f82a6 Fix 19 bugs from investigation
Critical/High severity fixes:
- BUG-001: Path traversal vulnerabilities (3 endpoints)
- BUG-003: BM25 filters now apply size/observation_types
- BUG-006: Remove API key from log messages
- BUG-008: Chunk size validation before yielding
- BUG-009: Race condition fix with FOR UPDATE SKIP LOCKED
- BUG-010: Add mcp_servers property to MessageProcessor
- BUG-011: Fix user_id type (BigInteger→Integer)
- BUG-012: Swap inverted score thresholds
- BUG-013: Add retry logic to embedding pipeline
- BUG-014: Fix CORS to use specific origin
- BUG-015: Add Celery retry/timeout defaults
- BUG-016: Re-raise exceptions for Celery retries

Medium severity fixes:
- BUG-017: Add collection_name index on Chunk
- BUG-031: Add SearchConfig limits (max 1000/300s)
- BUG-033: Replace debug prints with logger calls
- BUG-037: Clarify timezone handling in scheduler
- BUG-043: Health check now validates DB + Qdrant
- BUG-055: collection_model returns None not "unknown"

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-19 19:07:10 +01:00
Daniel O'Connell
a1444efaac Add database and file restore tools
tools/restore_databases.sh: Script to restore PostgreSQL and Qdrant
backups from encrypted backup files.

tools/restore_files.py: Python script to restore Fernet-encrypted
file backups.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-19 18:38:25 +01:00
Daniel O'Connell
116d0362a2 Fix REGISTER_ENABLED always evaluating to True (BUG-005)
Logic error: `boolean_env(...) or True` always evaluates to True,
making the environment variable useless.

Fixed by removing `or True`. Note: This setting is currently unused
in the codebase but the fix ensures correct behavior when it's used.

Also updates DISCORD_MODEL default to claude-haiku-4-5 for faster
and cheaper responses.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-19 18:33:05 +01:00
Daniel O'Connell
28bc10df92 Fix break_chunk() appending wrong object (BUG-007)
The function was appending the entire DataChunk object instead of
the individual item when processing non-string data (e.g., images).

Bug: `result.append(chunk)` should have been `result.append(c)`

This caused:
- Type mismatches (returning DataChunk instead of MulitmodalChunk)
- Potential circular references
- Embedding failures for mixed content

Fixed by appending the individual item `c` instead of the parent `chunk`.
Updated existing test and added new test to verify behavior.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-19 18:27:13 +01:00
Daniel O'Connell
21dedbeb61 Fix search score aggregation to use mean instead of sum
BUG-004: Score aggregation was broken - documents with more chunks
would always rank higher regardless of relevance because scores were
summed instead of averaged.

Changes:
- Changed score calculation from sum() to mean()
- Added comprehensive tests for SearchResult.from_source_item()
- Added tests for elide_content helper

This ensures search results are ranked by actual relevance rather
than by the number of chunks in the document.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-19 18:25:15 +01:00
Daniel O'Connell
93b77a16d6 Add pytest markers for fast/slow test separation
- Add --run-slow flag to optionally include slow tests
- Auto-detect tests that use db_session, test_db, db_engine, or qdrant fixtures
- Skip slow tests by default for faster development iteration
- Usage: pytest (fast only) or pytest --run-slow (all tests)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-19 18:21:41 +01:00
56ed7b7d8f fix scheduler 2025-11-04 12:46:38 +00:00
470061bd43 fix health checks 2025-11-03 22:53:19 +00:00
Daniel O'Connell
ad6510bd17 add a bunch of tests 2025-11-03 23:23:41 +01:00
56c0df9761 tweaks 2025-11-03 19:42:13 +00:00
b568222e88 list discord command 2025-11-03 19:19:55 +00:00
8893018af1 multiple mcp servers 2025-11-03 16:41:26 +00:00
2d3dc06fdf tool to set up discord bot 2025-11-03 11:11:19 +00:00
2944a0bce1 properly handle mcp redirects 2025-11-03 00:00:02 +00:00
Daniel O'Connell
0d9f8beec3 handle mcp servers in discord 2025-11-02 23:49:50 +01:00
Daniel O'Connell
64bb926eba mcp servers for discord bots 2025-11-02 23:49:44 +01:00
Daniel O'Connell
6250586d1f prompt from bot user 2025-11-02 23:49:35 +01:00
9182f15c45 properly handle bot prompts 2025-11-02 15:51:30 +00:00
Daniel O'Connell
afdff1708b prompt from bot user 2025-11-02 16:46:26 +01:00
Daniel O'Connell
64e84b1c89 basic tools 2025-11-02 16:34:38 +01:00
798b4779da unify discord callers 2025-11-02 14:46:43 +00:00
69192f834a handle discord threads 2025-11-02 11:23:31 +00:00
Daniel O'Connell
6bd7df8ee3 properly handle images by anthropic 2025-11-02 12:08:46 +01:00
a4f42e656a save images 2025-11-02 10:25:23 +00:00
e95a082147 allow discord tools 2025-11-02 00:50:12 +00:00
c42513100b db backups 2025-11-02 00:24:35 +00:00
a5bc53326d backups 2025-11-02 00:01:35 +00:00
131427255a fix typing indicator 2025-11-01 20:27:57 +00:00
Daniel O'Connell
ff3ca4f109 show typing 2025-11-01 21:13:39 +01:00
3b216953ab better docker compise 2025-11-01 19:51:41 +00:00
Daniel O'Connell
d7e403fb83 optional chattiness 2025-11-01 20:39:15 +01:00
57145ac7b4 fix bugs 2025-11-01 19:35:20 +00:00
Daniel O'Connell
814090dccb use db bots 2025-11-01 18:52:37 +01:00
Daniel O'Connell
9639fa3dd7 use usage tracker 2025-11-01 18:49:06 +01:00
Daniel O'Connell
8af07f0dac add slash commands for discord 2025-11-01 18:04:38 +01:00
Daniel O'Connell
c296f3b533 extract usage 2025-11-01 17:56:20 +01:00
Daniel O'Connell
07852f9ee7 Base usage tracker 2025-11-01 16:22:40 +01:00
Daniel O'Connell
bcb470db9b use redis for celery backend 2025-11-01 15:55:59 +01:00
EC2 Default User
4fedd8fe04 fix admin 2025-10-20 22:09:06 +00:00