BUG-030: Add rate limiting via slowapi middleware - Added slowapi to requirements - Configurable limits: 100/min default, 30/min search, 10/min auth - Rate limit settings in settings.py BUG-028: Fix filter validation in embeddings.py - Unknown filter keys now logged and ignored instead of passed through - Prevents potential filter injection BUG-034: Fix timezone handling in oauth_provider.py - Now uses timezone-aware UTC comparison for refresh tokens BUG-050: Fix SQL injection in test database handling - Added validate_db_identifier() function - Validates database names contain only safe characters Also: - Updated tests for bcrypt password format - Updated test for filter validation behavior - Updated INVESTIGATION.md with fix status 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
26 KiB
Memory System Investigation
Investigation Status
- Started: 2025-12-19
- Last Updated: 2025-12-19 (Fourth Pass - Complete Verification)
- Status: Complete
- Total Issues Found: 100+ (original) + 10 new critical issues
- Bugs Fixed/Verified: 40+ (fixed or confirmed as non-issues)
Executive Summary
This investigation identified 100+ issues across 7 areas of the memory system. Many critical issues have been fixed:
Fixed Issues ✅
- Security: Path traversal (BUG-001), CORS (BUG-014), password hashing (BUG-061), token logging (BUG-062), shell injection (BUG-064), rate limiting (BUG-030), filter validation (BUG-028), test SQL injection (BUG-050)
- Worker reliability: Retry config (BUG-015), silent failures (BUG-016), task time limits (BUG-035)
- Search: BM25 filters (BUG-003), embed status (BUG-019), SearchConfig limits (BUG-031)
- Infrastructure: Resource limits (BUG-040/067), Redis persistence (BUG-068), health checks (BUG-043)
- Code quality: SQLAlchemy deprecations (BUG-063), print statements (BUG-033/060), timezone handling (BUG-034)
Remaining Issues
- Data migration: Existing 9,370 book chunks need re-indexing to move from text to book collection (BUG-002 code fix applied)
- Search system: BM25 scores discarded (BUG-026) - architectural change needed for hybrid scoring
- Code quality: Bare exceptions (BUG-047/048), type safety gaps (BUG-045/046)
Critical Bugs (Immediate Action Required)
BUG-001: Path Traversal Vulnerabilities
- Severity: CRITICAL
- Area: API Security
- Files:
src/memory/api/app.py:54-64-/files/{path}endpointsrc/memory/api/MCP/memory.py:355-365-fetch_filetoolsrc/memory/api/MCP/memory.py:335-352-note_filestool
- Description: No validation that requested files are within allowed directories
- Impact: Arbitrary file read on server filesystem
- Fix: Add path resolution validation with
.resolve()and prefix check
BUG-002: Collection Mismatch ✅ INVESTIGATED & FIXED
- Severity: MEDIUM (not as critical as originally thought)
- Area: Data/Embedding Pipeline
- Description: BookSection._chunk_contents() called extract_text() without specifying modality, defaulting to "text"
- Impact: 9,370 book chunks stored in text collection instead of book
- Root Cause:
extract_text()defaults tomodality="text"but BookSection didn't override it - Fix Applied: Added
modality="book"to BookSection._chunk_contents() DataChunk creation - Note: Original 1,338 mail items investigation was outdated - current mismatch is 24 mail->text chunks which are actually email attachments (correct behavior)
- TODO: Existing 9,370 book chunks need re-indexing to move from text to book collection
BUG-003: BM25 Filters Completely Ignored
- Severity: CRITICAL
- Area: Search System
- File:
src/memory/api/search/bm25.py:32-43 - Description: BM25 search ignores tags, dates, size filters - only applies source_ids
- Impact: Filter results diverge between BM25 and vector search
- Fix: Apply all filters consistently in BM25 search
BUG-004: Search Score Aggregation Broken
- Severity: CRITICAL
- Area: Search System
- File:
src/memory/api/search/types.py:44-45 - Description: Scores are summed across chunks instead of averaged
- Impact: Documents with more chunks always rank higher regardless of relevance
- Fix: Change to mean() or max-based ranking
BUG-005: Registration Always Enabled
- Severity: CRITICAL
- Area: Configuration/Security
- File:
src/memory/common/settings.py:178 - Description: Logic error:
REGISTER_ENABLED = boolean_env(...) or Truealways evaluates to True - Impact: Open registration regardless of configuration
- Fix: Remove
or True
BUG-006: API Key Logged in Plain Text
- Severity: CRITICAL
- Area: Security
- File:
src/memory/discord/api.py:63 - Description: Bot API key logged in error message
- Impact: Credentials exposed in logs
- Fix: Remove API key from log message
NEW CRITICAL BUGS (2025-12-19 Second Pass)
BUG-061: Insecure Password Hashing Using SHA-256
- Severity: CRITICAL 🚨
- Area: Authentication/Security
- File:
src/memory/common/db/models/users.py:23-26 - Description: Password hashing uses SHA-256 instead of purpose-built password hashing algorithms
- Code:
def hash_password(password: str) -> str: salt = secrets.token_hex(16) return f"{salt}:{hashlib.sha256((salt + password).encode()).hexdigest()}" - Impact:
- SHA-256 is designed for speed, making it vulnerable to brute-force attacks
- Attackers can test billions of password combinations per second with GPUs
- Even with salt, passwords are at high risk of compromise
- Fix: Replace with bcrypt, argon2, scrypt, or PBKDF2 which are designed to be slow
- Priority: IMMEDIATE - All existing password hashes are insecure
BUG-062: Full Token Logging
- Severity: HIGH
- Area: Security/Logging
- File:
src/memory/api/MCP/oauth_provider.py:310 - Description: Full OAuth token logged in plaintext
- Code:
logger.info(f"Exchanged authorization code: {token}") - Impact: Tokens exposed in logs can be used to impersonate users
- Fix: Remove token from logs entirely or log only hash/truncated version
- Related: Similar issues in lines 85, 398, 429, 443, 448
BUG-063: Deprecated SQLAlchemy .get() Usage (24+ instances)
- Severity: MEDIUM
- Area: Database/Code Quality
- Description: Using deprecated
session.query(Model).get(id)pattern - Impact:
- Will break with SQLAlchemy 2.0+
- Less efficient than modern API
- Fix: Replace with
session.get(Model, id) - Files affected: auth.py, oauth_provider.py, base.py, discord files, worker tasks
- Examples:
src/memory/api/auth.py:79-session = db.query(UserSession).get(session_id)src/memory/api/MCP/base.py:151-user_session = session.query(UserSession).get(access_token.token)- 22 more instances across codebase
BUG-064: Shell=True Command Execution
- Severity: MEDIUM
- Area: Security/Code Quality
- File:
src/memory/workers/tasks/notes.py:38 - Description: Using
subprocess.run()withshell=True - Code:
cmd = f"git -C {shlex.quote(repo_root.as_posix())} {' '.join(escaped_args)}" res = subprocess.run(cmd, shell=True, ...) - Impact:
- Unnecessary shell invocation increases attack surface
- While currently mitigated by shlex.quote(), still best practice violation
- Fix: Use subprocess with argument list instead of shell string
- Note: Arguments ARE properly escaped with shlex.quote(), reducing immediate risk
BUG-065: Timing Attack in Password Verification
- Severity: MEDIUM-HIGH
- Area: Authentication/Security
- File:
src/memory/common/db/models/users.py:33 - Description: Password hash comparison uses
==operator instead of constant-time comparison - Code:
return hashlib.sha256((salt + password).encode()).hexdigest() == hash_value - Impact:
- Timing attacks could leak information about password hashes
- Attackers can measure comparison time to infer hash similarity
- Combined with weak SHA-256 hashing, enables faster brute-force
- Fix: Replace with
secrets.compare_digest(computed_hash, hash_value) - Related to: BUG-061 (both are password security issues)
BUG-066: No Unique Index on OAuthState.state
- Severity: LOW-MEDIUM
- Area: Database/Performance
- Description: OAuth state parameter lacks unique constraint at database level
- Impact:
- Could allow duplicate state values
- Performance degradation on lookups
- Potential OAuth confusion attacks
- Evidence: Migration
20251103_154126_mcp_servers.py:53has unique constraint onmcp_servers.statebutoauth_statestable may lack it - Fix: Add unique index to oauth_states.state column
BUG-067: Incomplete Resource Limits in Docker Compose
- Severity: LOW
- Area: Infrastructure
- Description: Only one service has resource limits configured
- File:
docker-compose.yaml:195 - Current: Only
ingest-hubhas limits:cpus: 0.5, memory: 512m - Missing: postgres, redis, qdrant, api, workers have no limits
- Impact: Services could consume all host resources causing OOM or CPU starvation
- Fix: Add resource limits to all services
BUG-068: Redis Persistence Disabled
- Severity: LOW-MEDIUM
- Area: Infrastructure/Data Integrity
- File:
docker-compose.yaml:108 - Description: Redis configured with persistence disabled
- Code:
redis-server --save "" --appendonly "no" - Impact:
- All Redis data (LLM rate limits, usage tracking) lost on restart
- LLM usage tracking state resets
- Could allow rate limit bypass after restart
- Fix: Enable AOF or RDB persistence unless purely ephemeral cache is intended
- Note: May be intentional design decision - verify requirements
FIXED BUGS (Confirmed in Recent Commits)
Based on git history analysis, the following bugs have been FIXED:
✅ BUG-001: Path Traversal Vulnerabilities - FIXED
- File:
src/memory/api/app.py:48-70 - Fix: Added
validate_path_within_directory()function - Implementation: Properly validates paths using
.resolve()and prefix checking
✅ BUG-004: Search Score Aggregation - FIXED
- Commit:
21dedbe"Fix search score aggregation to use mean instead of sum" - Fix: Changed from sum to mean aggregation
✅ BUG-005: Registration Always Enabled - FIXED
- Commit:
116d036"Fix REGISTER_ENABLED always evaluating to True (BUG-005)" - File:
src/memory/common/settings.py:178 - Fix: Removed
or Truefrom logic
✅ BUG-007: Wrong Object Appended in break_chunk() - FIXED
- Commit:
28bc10d"Fix break_chunk() appending wrong object (BUG-007)" - Fix: Corrected to append individual item instead of entire chunk object
✅ BUG-014: CORS Misconfiguration - FIXED
- File:
src/memory/api/app.py:41 - Fix: Changed from
allow_origins=["*"]toallow_origins=[settings.SERVER_URL]
✅ Mass Bug Fix
- Commit:
52274f8"Fix 19 bugs from investigation" - Note: 19 additional bugs were fixed in bulk - review commit for details
✅ BUG-010: MCP Servers Relationship - ALREADY FIXED
- File:
src/memory/common/db/models/discord.py:30-47 - Status: Implemented as @property using dynamic query
- Implementation: Uses object_session() to query MCPServerAssignment
✅ BUG-011: User ID Type Mismatch - ALREADY FIXED
- Files:
users.py:56,scheduled_calls.py:24 - Status: Both use Integer type (not BigInteger)
- Verification: User.id and ScheduledLLMCall.user_id are both Integer
✅ BUG-061 to BUG-068: Security & Infrastructure Fixes - FIXED
- Commit:
1c43f1a"Fix 7 critical security and code quality bugs" - Fixed: Password hashing, token logging, shell=True, SQLAlchemy deprecations, Docker limits, Redis persistence
✅ BUG-003: BM25 Filters - ALREADY FIXED
- File:
src/memory/api/search/bm25.py:32-62 - Status: All filters now applied (size, confidence, observation_types, source_ids)
✅ BUG-008: Oversized Chunks - ALREADY FIXED
- File:
src/memory/common/chunker.py - Status:
yield_spans()guarantees all spans are under max_tokens
✅ BUG-009: Race Condition - ALREADY FIXED
- File:
src/memory/workers/tasks/scheduled_calls.py:164 - Status: Uses
.with_for_update(skip_locked=True)for atomic claim
✅ BUG-013: Embedding Error Handling - ALREADY FIXED
- File:
src/memory/common/embedding.py:78-92 - Status: Has try-except with retry logic and exponential backoff
High Severity Bugs (Most Now Fixed)
✅ BUG-007: Wrong Object Appended in break_chunk() - FIXED
- File:
src/memory/common/embedding.py:57 - Status: Fixed in commit
28bc10d
✅ BUG-008: Oversized Chunks Exceed Token Limits - FIXED
- Status: yield_spans() now guarantees token limits
✅ BUG-009: Scheduled Call Race Condition - FIXED
- Status: Fixed with FOR UPDATE SKIP LOCKED
✅ BUG-010: Missing MCP Servers Relationship - FIXED
- File:
src/memory/common/db/models/discord.py:30-47 - Status: Implemented as @property using dynamic query
BUG-011: User ID Type Mismatch
- Files:
users.py:47,scheduled_calls.py:23 - Description:
ScheduledLLMCall.user_idis BigInteger butUser.idis Integer - Impact: Foreign key constraint violations
BUG-012: Inverted Min Score Thresholds
- File:
src/memory/api/search/embeddings.py:186-207 - Description: Multimodal uses 0.25, text uses 0.4 - should be reversed
- Impact: Multimodal results artificially boosted
BUG-013: No Error Handling in Embedding Pipeline
- File:
src/memory/common/embedding.py - Description: No try-except blocks around Voyage AI API calls
- Impact: Entire content processing fails on API error
BUG-014: Unrestricted CORS Configuration ✅ FIXED
- File:
src/memory/api/app.py:36-42 - Description:
Now usesallow_origins=["*"]withallow_credentials=Truesettings.SERVER_URL - Impact:
CSRF attacks enabledFixed - Status: ✅ Already fixed - CORS now uses specific origin from settings
BUG-015: Missing Retry Configuration ✅ FIXED
- Files: All task files
- Description:
NoGlobal config in celery_app.pyautoretry_for,max_retrieson any Celery tasks - Impact:
Transient failures lost without retryFixed - Status: ✅ Already fixed - celery_app.py has global retry config (autoretry, max_retries=3, backoff, jitter)
BUG-016: Silent Task Failures ✅ FIXED
- File:
src/memory/workers/tasks/content_processing.py:258-296 - Description:
Now re-raises exceptionssafe_task_executioncatches all exceptions, returns as dict - Impact:
Failed tasks can't be retried by CeleryFixed - Status: ✅ Already fixed - exceptions are now re-raised after logging to allow Celery retries
Medium Severity Bugs
Data Layer
- BUG-017: ✅ Missing
collection_nameindex - FIXED (Index exists at source_item.py:168) - BUG-018: N/A AgentObservation dead code - intentional TODO comments for future embedding types
- BUG-019: ✅ Embed status never set to STORED after push - FIXED (properly sets STORED at lines 169, 245)
- BUG-020: ✅ Missing server_id index on DiscordMessage - FIXED (Index exists at source_items.py:428-432)
Content Processing
- BUG-021: ✅ No chunk validation after break_chunk - FIXED (yield_spans guarantees max_tokens)
- BUG-022: Low priority - extract_ebook creates single chunk, BUT sync_book task properly creates BookSection chunks
- BUG-023: SHA256-only deduplication misses semantic duplicates (
source_item.py:51-91) - BUG-024: Email hash inconsistency with markdown conversion (
email.py:171-185) - BUG-025: Acceptable - 4 chars/token is common approximation (accurate tokenization requires model-specific tokenizers)
Search System
- BUG-026: BM25 scores calculated then discarded (
bm25.py:66-70) - BUG-027: N/A LLM score fallback - actually reasonable (0.0 means chunk not prioritized when scoring fails)
- BUG-028: ✅ Missing filter validation - FIXED (unknown filter keys now logged and ignored instead of passed through)
- BUG-029: N/A Hardcoded min_score thresholds - intentional (0.25 text, 0.4 multimodal due to different score distributions)
API Layer
- BUG-030: ✅ Missing rate limiting - FIXED (added slowapi middleware with configurable limits: 100/min default, 30/min search, 10/min auth)
- BUG-031: ✅ No SearchConfig limits - FIXED (enforces 1-1000 limit, 1-300 timeout in model_post_init)
- BUG-032: N/A CSRF protection - already mitigated (uses OAuth Bearer tokens not cookie-based auth, CORS restricts to specific origins)
- BUG-033: ✅ Debug print statements in production - FIXED (no print statements found in src/memory)
- BUG-034: ✅ Timezone handling issues - FIXED (now uses timezone-aware UTC comparison)
Worker Tasks
- BUG-035: ✅ No task time limits - FIXED (celery_app.py has task_time_limit=3600, task_soft_time_limit=3000)
- BUG-036: Acceptable - IntegrityError caught and returns error (retrying duplicates wouldn't help)
- BUG-037: ✅ Timezone bug in scheduled calls - FIXED (properly converts to UTC and strips tzinfo for DB comparison)
- BUG-038: N/A Beat schedule - standard practice is single beat process; use celery-redbeat for distributed
- BUG-039: ✅ Email sync fails entire account on single folder error - FIXED (process_folder has own try-except, continues to next folder)
Infrastructure
- BUG-040: ✅ Missing resource limits for postgres, redis, qdrant, api - FIXED in BUG-067
- BUG-041: N/A Backup encryption silently disabled - actually reasonable (S3_BACKUP_ENABLED=False when no key)
- BUG-042: Restore scripts don't validate database integrity (
restore_databases.sh:79) - BUG-043: ✅ Health check doesn't check dependencies - FIXED (now checks database and Qdrant connections)
- BUG-044: Uvicorn trusts all proxy headers (
docker/api/Dockerfile:63)
Code Quality
- BUG-045: 183 unsafe cast() operations (various files)
- BUG-046: 21 type:ignore comments (various files)
- BUG-047: 32 bare except Exception blocks (various files)
- BUG-048: 13 exception swallowing with pass (various files)
- BUG-049: N/A OAuth callback already has CSRF protection (state parameter validated against database, generated with secrets.token_urlsafe)
- BUG-050: ✅ SQL injection in test database handling - FIXED (added identifier validation for database names)
Low Severity Bugs
- BUG-051: Duplicate chunks (16 identical "Claude plays Pokemon" chunks)
- BUG-052: Garbage content in text collection
- BUG-053: No vector freshness index (
source_item.py:157) - BUG-054: N/A OAuthToken missing Base inheritance - intentional mixin design (used by OAuthState and OAuthRefreshToken)
- BUG-055: ✅ collection_model returns "unknown" - FIXED (now returns None instead of placeholder)
- BUG-056: ✅ Unused "appuser" in Dockerfile - FIXED (removed unused user creation)
- BUG-057: Build dependencies not cleaned up (
docker/api/Dockerfile:7-12) - BUG-058: Typos in log messages (
tests/conftest.py:63) - BUG-059: MockRedis overly simplistic (
tests/conftest.py:24-46) - BUG-060: ✅ Print statement in ebook.py:192 - FIXED (changed to logger.debug)
Improvement Suggestions
High Priority
- Implement proper retry logic for all Celery tasks with exponential backoff
- Add comprehensive health checks that validate all service dependencies
- Fix score aggregation to use mean/max instead of sum
- Add rate limiting to prevent DoS attacks
- Implement proper CSRF protection for OAuth flows
- Add resource limits to all Docker services
- Implement centralized logging with ELK or Grafana Loki
Medium Priority
- Re-chunk oversized content - add validation to enforce size limits
- Add chunk deduplication based on content hash within same source
- Preserve BM25 scores for hybrid search weighting
- Add task progress tracking for long-running operations
- Implement distributed beat lock for multi-worker deployments
- Add backup verification tests - periodically test restore
- Replace cast() with type guards throughout codebase
Lower Priority
- Add Prometheus metrics for observability
- Implement structured JSON logging with correlation IDs
- Add graceful shutdown handling to workers
- Document configuration requirements more thoroughly
- Add integration tests for critical workflows
- Remove dead code and TODO comments in production
Feature Ideas
Search Enhancements
- Hybrid score weighting - configurable balance between BM25 and vector
- Query expansion - automatic synonym/related term expansion
- Faceted search - filter by date ranges, sources, tags with counts
- Search result highlighting - show matched terms in context
- Saved searches - store and re-run common queries
Content Management
- Content quality scoring - automatic assessment of chunk quality
- Duplicate detection UI - show and merge semantic duplicates
- Re-indexing queue - prioritize content for re-embedding
- Content archiving - move old content to cold storage
- Bulk operations - tag, delete, re-process multiple items
Email Management
- Email filtering rules - configurable rules to filter/categorize emails (e.g., skip marketing spam but keep order confirmations, shipping notifications, appointment reminders)
- Email source classification - auto-detect email types (transactional, marketing, personal, receipts)
- Smart email retention - keep "useful" emails (orders, bookings, confirmations) while filtering noise
User Experience
- Search analytics - track what users search for
- Relevance feedback - let users rate results to improve ranking
- Personal knowledge graph - visualize connections between content
- Smart summaries - auto-generate summaries of search results
- Email digest - scheduled summary of new content
Infrastructure
- Auto-scaling workers - scale based on queue depth
- Multi-tenant support - isolate data by user/org
- Backup scheduling UI - configure backup frequency
- Monitoring dashboard - Grafana-style metrics visualization
- Audit logging - track all data access and modifications
Investigation Log
2025-12-19 - Complete Investigation
Data Layer (10 issues)
- Missing relationships (mcp_servers)
- Type mismatches (User.id)
- Missing indexes (collection_name, server_id)
- Dead code (AgentObservation)
Content Processing (12 issues)
- Critical: break_chunk bug appends wrong object
- Critical: Oversized chunks exceed limits
- Deduplication only on SHA256
- Ebook creates single massive chunk
Search System (14 issues)
- Critical: BM25 ignores filters
- Critical: Score aggregation broken (sum vs mean)
- Inverted min_score thresholds
- BM25 scores discarded
API Layer (12 issues)
- Critical: Path traversal vulnerabilities (3 endpoints)
- CORS misconfiguration
- Missing rate limiting
- Debug print statements
Worker Tasks (20 issues)
- No retry configuration
- Silent task failures
- Race condition in scheduled calls
- No task timeouts
Infrastructure (12 issues)
- Missing resource limits
- Backup encryption issues
- Health check incomplete
- No centralized logging
Code Quality (20+ issues)
- 183 unsafe casts
- 32 bare exception blocks
- Registration always enabled bug
- API key logging
Database Statistics
Sources by Modality:
forum: 981
mail: 665
text: 165
comic: 115
doc: 102
book: 78
observation: 26
note: 3
photo: 2
blog: 1
Chunks by Collection:
forum: 8786
text: 1843
mail: 1418
doc: 312
book: 156
semantic: 84
comic: 49
temporal: 26
blog: 7
photo: 2
Vectors in Qdrant:
forum: 8778
mail: 2756 (1338 orphaned!)
text: 505 (1338 missing!)
doc: 312
book: 156
semantic: 84
comic: 49
temporal: 26
blog: 7
photo: 2
Embed Status:
STORED: 2056
FAILED: 81
RAW: 1
Updated Priority List (After Second Pass)
CRITICAL - Fix Immediately
- ✅ FIXED: Path traversal vulnerabilities (BUG-001)
- ✅ FIXED: Registration always enabled (BUG-005)
- ✅ FIXED: Search score aggregation (BUG-004)
- ✅ FIXED: CORS misconfiguration (BUG-014)
- ✅ FIXED: Wrong object in break_chunk (BUG-007)
- 🚨 NEW: Replace SHA-256 password hashing with bcrypt/argon2 (BUG-061)
- 🔴 OPEN: Fix collection mismatch for 1,338 items (BUG-002)
- 🔴 OPEN: Fix BM25 filter application (BUG-003)
- 🔴 OPEN: Remove API key from logs (BUG-006)
HIGH Priority
- 🚨 NEW: Stop logging full OAuth tokens (BUG-062)
- 🚨 NEW: Fix timing attack in password verification (BUG-065)
- 🔴 OPEN: Add retry logic to all Celery tasks (BUG-015, BUG-016)
- 🔴 OPEN: Fix scheduled call race condition (BUG-009)
- 🔴 OPEN: Fix oversized chunks exceeding token limits (BUG-008)
MEDIUM Priority
- 🚨 NEW: Update 24+ deprecated SQLAlchemy .get() calls (BUG-063)
- 🚨 NEW: Remove shell=True from subprocess calls (BUG-064)
- 🔴 OPEN: Add resource limits to Docker services (BUG-040, BUG-067)
- 🔴 OPEN: Missing MCP servers relationship (BUG-010)
- 🔴 OPEN: User ID type mismatch (BUG-011)
Summary Statistics
- Total Bugs Found: 118 (100+ original + 8 new in second pass)
- Bugs Fixed: 25+ (confirmed in recent commits)
- Critical Bugs Open: 4
- High Priority Open: 5
- Medium/Low Open: 80+
Investigation Notes
What Was Checked (Second Pass - 2025-12-19)
✅ Security vulnerabilities (SQL injection, command injection, XSS) ✅ Authentication implementation (password hashing, session management) ✅ Logging practices (credential exposure) ✅ Database patterns (deprecated APIs, missing indexes) ✅ Docker configuration (resource limits, persistence) ✅ OAuth implementation (state management, token handling) ✅ Code quality (exception handling, type safety) ✅ Recent commits and fixes
Good Security Practices Observed
- ✅ Path traversal protection properly implemented (fixed)
- ✅ CORS properly configured with specific origins (fixed)
- ✅ Secrets loaded from files, not environment variables
- ✅ Services run as non-root users where possible
- ✅ Read-only filesystems for workers
- ✅ Security capabilities dropped in containers
- ✅ Healthchecks configured for critical services
- ✅ Git command arguments properly escaped with shlex.quote()
- ✅ Search result limits enforced (max 1000)
- ✅ Timeout limits enforced (max 300s)
- ✅ Rate limiting infrastructure exists for LLM usage
Areas Still Needing Attention
- 🔴 Password hashing needs complete overhaul
- 🔴 Logging practices need audit for credential exposure
- 🔴 Database API modernization for SQLAlchemy 2.0
- 🔴 Resource limits need to be added to all services
- 🔴 Redis persistence configuration needs review