BUG-030: Add rate limiting via slowapi middleware
- Added slowapi to requirements
- Configurable limits: 100/min default, 30/min search, 10/min auth
- Rate limit settings in settings.py
BUG-028: Fix filter validation in embeddings.py
- Unknown filter keys now logged and ignored instead of passed through
- Prevents potential filter injection
BUG-034: Fix timezone handling in oauth_provider.py
- Now uses timezone-aware UTC comparison for refresh tokens
BUG-050: Fix SQL injection in test database handling
- Added validate_db_identifier() function
- Validates database names contain only safe characters
Also:
- Updated tests for bcrypt password format
- Updated test for filter validation behavior
- Updated INVESTIGATION.md with fix status
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Root cause: BookSection._chunk_contents() called extract_text() without
specifying modality, which defaults to "text". This caused 9,370 book
chunks to be stored in the 'text' collection instead of 'book'.
Fix: Added modality="book" to all DataChunk creation in BookSection:
- extract_text() call for single-page sections
- Direct DataChunk creation for multi-page sections
Note: The original investigation reported 1,338 mail items, but current
analysis shows those are actually email attachments which correctly go
to text/doc/photo collections based on their content type.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Logic error: `boolean_env(...) or True` always evaluates to True,
making the environment variable useless.
Fixed by removing `or True`. Note: This setting is currently unused
in the codebase but the fix ensures correct behavior when it's used.
Also updates DISCORD_MODEL default to claude-haiku-4-5 for faster
and cheaper responses.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The function was appending the entire DataChunk object instead of
the individual item when processing non-string data (e.g., images).
Bug: `result.append(chunk)` should have been `result.append(c)`
This caused:
- Type mismatches (returning DataChunk instead of MulitmodalChunk)
- Potential circular references
- Embedding failures for mixed content
Fixed by appending the individual item `c` instead of the parent `chunk`.
Updated existing test and added new test to verify behavior.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
BUG-004: Score aggregation was broken - documents with more chunks
would always rank higher regardless of relevance because scores were
summed instead of averaged.
Changes:
- Changed score calculation from sum() to mean()
- Added comprehensive tests for SearchResult.from_source_item()
- Added tests for elide_content helper
This ensures search results are ranked by actual relevance rather
than by the number of chunks in the document.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>