69 Commits

Author SHA1 Message Date
09215adf9a Add comprehensive tests for search improvements
- Add tests for extract_query_terms (stopword filtering, short words)
- Add tests for apply_query_term_boost (boost calculations, edge cases)
- Add tests for deduplicate_by_source (keeps highest per source)
- Add tests for apply_title_boost (title matching with mocked DB)
- Add tests for fuse_scores_rrf (RRF score fusion, ranking behavior)
- Add tests for rerank module (VoyageAI reranker mocking)

Uses pytest.mark.parametrize for concise, data-driven tests.
77 tests total covering all new search functionality.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-20 22:26:16 +00:00
d644281b26 Fix 5 security and quality bugs
BUG-030: Add rate limiting via slowapi middleware
- Added slowapi to requirements
- Configurable limits: 100/min default, 30/min search, 10/min auth
- Rate limit settings in settings.py

BUG-028: Fix filter validation in embeddings.py
- Unknown filter keys now logged and ignored instead of passed through
- Prevents potential filter injection

BUG-034: Fix timezone handling in oauth_provider.py
- Now uses timezone-aware UTC comparison for refresh tokens

BUG-050: Fix SQL injection in test database handling
- Added validate_db_identifier() function
- Validates database names contain only safe characters

Also:
- Updated tests for bcrypt password format
- Updated test for filter validation behavior
- Updated INVESTIGATION.md with fix status

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-19 21:41:16 +00:00
Daniel O'Connell
28bc10df92 Fix break_chunk() appending wrong object (BUG-007)
The function was appending the entire DataChunk object instead of
the individual item when processing non-string data (e.g., images).

Bug: `result.append(chunk)` should have been `result.append(c)`

This caused:
- Type mismatches (returning DataChunk instead of MulitmodalChunk)
- Potential circular references
- Embedding failures for mixed content

Fixed by appending the individual item `c` instead of the parent `chunk`.
Updated existing test and added new test to verify behavior.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-19 18:27:13 +01:00
Daniel O'Connell
21dedbeb61 Fix search score aggregation to use mean instead of sum
BUG-004: Score aggregation was broken - documents with more chunks
would always rank higher regardless of relevance because scores were
summed instead of averaged.

Changes:
- Changed score calculation from sum() to mean()
- Added comprehensive tests for SearchResult.from_source_item()
- Added tests for elide_content helper

This ensures search results are ranked by actual relevance rather
than by the number of chunks in the document.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-19 18:25:15 +01:00
Daniel O'Connell
93b77a16d6 Add pytest markers for fast/slow test separation
- Add --run-slow flag to optionally include slow tests
- Auto-detect tests that use db_session, test_db, db_engine, or qdrant fixtures
- Skip slow tests by default for faster development iteration
- Usage: pytest (fast only) or pytest --run-slow (all tests)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-19 18:21:41 +01:00
56ed7b7d8f fix scheduler 2025-11-04 12:46:38 +00:00
Daniel O'Connell
ad6510bd17 add a bunch of tests 2025-11-03 23:23:41 +01:00
a5bc53326d backups 2025-11-02 00:01:35 +00:00
Daniel O'Connell
814090dccb use db bots 2025-11-01 18:52:37 +01:00
Daniel O'Connell
9639fa3dd7 use usage tracker 2025-11-01 18:49:06 +01:00
Daniel O'Connell
8af07f0dac add slash commands for discord 2025-11-01 18:04:38 +01:00
Daniel O'Connell
07852f9ee7 Base usage tracker 2025-11-01 16:22:40 +01:00
Daniel O'Connell
1a3cf9c931 add tetsts 2025-10-20 21:10:39 +02:00
Daniel O'Connell
1606348d8b discord integration 2025-10-20 03:47:13 +02:00
Daniel O'Connell
99d3843f47 move to general LLM providers 2025-10-13 03:23:20 +02:00
Daniel O'Connell
f454aa9afa change schedule call signature 2025-10-12 10:17:22 +02:00
Daniel O'Connell
a3544222e7 add scheduled calls 2025-08-12 23:37:54 +00:00
Daniel O'Connell
b68e15d3ab add blogs 2025-08-09 02:07:49 +02:00
Daniel O'Connell
beb94375da fix tests 2025-07-24 23:34:10 +02:00
Daniel O'Connell
50601ad930 proper notes path 2025-07-06 13:53:29 +02:00
Daniel O'Connell
288c2995e5 synch notes 2025-07-05 23:58:47 +02:00
Daniel O'Connell
8eb6374cac second pass in search 2025-06-28 20:59:15 +02:00
Daniel O'Connell
01ccea2733 add missing tests 2025-06-28 02:30:54 +02:00
Daniel O'Connell
a3daea883b fix tests 2025-06-26 14:12:42 +02:00
Daniel O'Connell
0e574542d5 fix tests 2025-06-10 15:32:34 +02:00
Daniel O'Connell
3e4e5872d1 search filters 2025-06-10 12:16:54 +02:00
Daniel O'Connell
780e27ba04 better emails embedding + format search results 2025-06-09 13:51:58 +02:00
Daniel O'Connell
4d057d1ec6 discord notification on error 2025-06-05 02:21:52 +02:00
Daniel O'Connell
e5da3714de muliple dimemnsions for confidence values 2025-06-03 12:18:20 +02:00
Daniel O'Connell
a40e0b50fa editable notes 2025-06-02 22:24:19 +02:00
Daniel O'Connell
ac3b48a04c notes and observations triggered as jobs 2025-06-02 14:34:39 +02:00
Daniel O'Connell
29b8ce6860 Fix search + proper integration tests 2025-06-02 02:53:32 +02:00
Daniel O'Connell
1dd93929c1 Add embedding for observations 2025-05-31 16:51:55 +02:00
Daniel O'Connell
004bd39987 Add observations model 2025-05-31 16:15:30 +02:00
Daniel O'Connell
e505f9b53c summarize before chunking 2025-05-29 01:26:10 +02:00
Daniel O'Connell
ed8033bdd3 Add less wrong tasks + reindexer 2025-05-28 03:14:27 +02:00
Daniel O'Connell
ab87bced81 fix linting 2025-05-27 23:19:28 +02:00
Daniel O'Connell
1291ca9d08 better handling of errors 2025-05-27 22:39:24 +02:00
Daniel O'Connell
f5c3e458d7 move parsers 2025-05-27 21:53:31 +02:00
Daniel O'Connell
0f15e4e410 Check all feeds work 2025-05-27 01:42:22 +02:00
Daniel O'Connell
876fa87725 Add archives fetcher 2025-05-27 01:24:57 +02:00
Daniel O'Connell
27fbfcc548 add rss fetcher 2025-05-26 17:28:01 +02:00
Daniel O'Connell
482aefabe3 tests for models 2025-05-26 12:53:56 +02:00
Daniel O'Connell
a5618f3543 simplify embedding 2025-05-26 02:02:50 +02:00
Daniel O'Connell
9f1632555b tests for content processing 2025-05-25 20:38:02 +02:00
Daniel O'Connell
4aaa45e09c unify tasks 2025-05-25 20:02:47 +02:00
Daniel O'Connell
e8070a3557 proper chunk sizes for books 2025-05-25 11:23:19 +02:00
Daniel O'Connell
eb69221999 Add blog parser 2025-05-25 00:33:27 +02:00
Daniel O'Connell
02d606deab add ebook job 2025-05-24 20:21:41 +02:00
Daniel O'Connell
b292baf59d add ebook parser 2025-05-21 00:49:27 +02:00