This document traces the complete embedding pipeline from document upload to vector database storage, identifying ALL potential failure points in both Redis/RQ and non-Redis scenarios.
Location: src/lib/apis/files/index.ts → uploadFile()
Endpoint: POST /files/
Flow:
- User selects file in frontend
- Frontend calls
uploadFile()with file and token - File sent as FormData to
/files/endpoint
Potential Failure Points:
- ❌ F1.1: Network failure during upload
- ❌ F1.2: File size exceeds server limits (not validated in frontend)
- ❌ F1.3: Invalid/expired authentication token
- ❌ F1.4: CORS errors preventing upload
- ❌ F1.5: Browser timeout for large files
- ❌ F1.6: File is empty (0 bytes) - frontend checks but race condition possible
Location: backend/open_webui/routers/files.py → upload_file()
Flow:
- FastAPI receives file via
UploadFile - Generate UUID for file ID
- Sanitize filename
- Save file to storage via
Storage.upload_file() - Insert file record into database via
Files.insert_new_file() - Trigger background processing via
process_file()
Potential Failure Points:
- ❌ F2.1: File upload fails - disk full, permissions error, storage unavailable
- ❌ F2.2: UUID generation fails (extremely unlikely but possible)
- ❌ F2.3: Filename sanitization fails or creates invalid path
- ❌ F2.4:
Storage.upload_file()fails - storage backend unavailable, quota exceeded - ❌ F2.5: Database insert fails - connection error, constraint violation, transaction rollback
- ❌ F2.6: File record created but file not saved to disk (partial failure)
- ❌ F2.7: File saved but database insert fails (orphaned file)
- ❌ F2.8:
process_file()call fails before background task is added - ❌ F2.9: Background task addition fails silently
- ❌ F2.10: Exception during upload handler execution - file may be partially saved
Location: backend/open_webui/routers/retrieval.py → process_file()
Flow:
- Validate file_id format (UUID)
- Fetch file from database
- Check Redis lock (if Redis available) or database status
- Update file status to "pending"
- Enqueue job OR add BackgroundTask
- Release lock
Potential Failure Points:
- ❌ F3.1: Invalid file_id format (not UUID) - validation should catch but edge cases exist
- ❌ F3.2: File not found in database (deleted between upload and processing)
- ❌ F3.3: Redis lock acquisition fails - Redis unavailable, connection timeout
- ❌ F3.4: Database status check fails - connection error, query timeout
- ❌ F3.5: Race condition: Multiple requests process same file simultaneously
- ❌ F3.6: Status update fails - database write error, constraint violation
- ❌ F3.7: Lock acquired but status update fails - lock held indefinitely
- ❌ F3.8: Job enqueue fails - Redis unavailable, queue full, serialization error
- ❌ F3.9: BackgroundTask addition fails - task queue full, memory error
- ❌ F3.10: Lock not released after failure - file stuck in "pending" state
- ❌ F3.11: API key retrieval fails - config not loaded, user not found
- ❌ F3.12: No API key configured for user - processing will fail later but not caught here
Location: backend/open_webui/routers/retrieval.py → _process_file_sync()
Flow:
- Get user object from user_id
- Update status to "processing"
- Fetch file from database
- Extract content (if not provided)
- Save content to file.data
- Calculate hash
- Generate embeddings and save to vector DB
- Update status to "completed"
Potential Failure Points:
- ❌ F4.1: User retrieval fails - user deleted, database error
- ❌ F4.2: Status update to "processing" fails - database error
- ❌ F4.3: File not found - deleted during processing
- ❌ F4.4: File path is None or invalid
- ❌ F4.5: Storage.get_file() fails - file not found on disk, path resolution error
- ❌ F4.6: Content extraction fails - Loader initialization error, file corrupted, unsupported format
- ❌ F4.7: PDF parsing fails - PyPDFLoader error, encrypted PDF, corrupted file
- ❌ F4.8: Tika server unavailable (if using Tika engine)
- ❌ F4.9: Document Intelligence API fails (if using Azure)
- ❌ F4.10: Content extraction returns empty documents
- ❌ F4.11: Content extraction returns None (loader error)
- ❌ F4.12: File.data update fails - database error
- ❌ F4.13: Hash calculation fails - encoding error, empty content
- ❌ F4.14: Hash update fails - database error
- ❌ F4.15: Embedding API key not available in background task context
- ❌ F4.16: Embedding function initialization fails - config error, model not found
- ❌ F4.17: Embedding generation fails - API error, rate limit, timeout, invalid key
- ❌ F4.18: Vector DB connection fails - database unavailable, connection pool exhausted
- ❌ F4.19: Vector DB insert fails - constraint violation, disk full, transaction error
- ❌ F4.20: Status update to "completed" fails - database error
- ❌ F4.21: Exception during processing - status may not be updated to "error"
- ❌ F4.22: Background task killed by system (OOM, timeout)
Location: backend/open_webui/retrieval/loaders/main.py → Loader.load()
Flow:
- Check file exists
- Determine loader type based on engine and file type
- Initialize appropriate loader (PyPDFLoader, TikaLoader, etc.)
- Call loader.load()
- Process documents with ftfy
Potential Failure Points:
- ❌ F5.1: File does not exist at path - file deleted, path incorrect
- ❌ F5.2: File exists but not readable - permissions error
- ❌ F5.3: File size is 0 bytes
- ❌ F5.4: Loader selection fails - unknown file type, engine misconfiguration
- ❌ F5.5: PyPDFLoader fails - corrupted PDF, encrypted PDF, unsupported PDF version
- ❌ F5.6: TikaLoader fails - Tika server unavailable, network error, timeout
- ❌ F5.7: Document Intelligence fails - API error, invalid credentials, quota exceeded
- ❌ F5.8: TextLoader fails - encoding error, file locked
- ❌ F5.9: Loader returns empty list - no content extracted
- ❌ F5.10: Loader returns None - exception in loader
- ❌ F5.11: ftfy processing fails - encoding error
- ❌ F5.12: Document metadata missing or invalid
Location: backend/open_webui/routers/retrieval.py → save_docs_to_vector_db() or save_docs_to_multiple_collections()
Flow:
- Get user chunk settings (chunk_size, chunk_overlap)
- Initialize text splitter (RecursiveCharacterTextSplitter or TokenTextSplitter)
- Split documents
- Validate split documents not empty
Potential Failure Points:
- ❌ F6.1: User chunk settings retrieval fails - user not found, config error
- ❌ F6.2: Chunk size is 0 or negative - invalid configuration
- ❌ F6.3: Text splitter initialization fails - invalid splitter type, encoding error
- ❌ F6.4: TokenTextSplitter fails - tiktoken encoding not found, encoding error
- ❌ F6.5: Text splitting returns empty list - all content filtered out
- ❌ F6.6: Text splitting fails with exception - memory error, encoding error
- ❌ F6.7: Split documents validation fails - empty content after splitting
Location: backend/open_webui/routers/retrieval.py → save_docs_to_vector_db() or save_docs_to_multiple_collections()
Flow:
- Get user API key from config
- Get base URL from config
- Initialize embedding function
- Generate embeddings for all text chunks
- Handle fallback if single batch fails
Potential Failure Points:
- ❌ F7.1: User API key not found - admin not configured key, user not in group
- ❌ F7.2: API key is None or empty string
- ❌ F7.3: Base URL not configured or invalid
- ❌ F7.4: Embedding function initialization fails - model not found, config error
- ❌ F7.5: Single batch embedding fails - API error, rate limit, timeout
- ❌ F7.6: Fallback embedding fails - same issues as F7.5
- ❌ F7.7: Embedding API returns invalid response - wrong format, empty embeddings
- ❌ F7.8: Embedding API returns error - authentication failed, quota exceeded
- ❌ F7.9: Network timeout during embedding API call
- ❌ F7.10: Embedding dimension mismatch - model changed, config mismatch
- ❌ F7.11: Embedding returns None or empty list
- ❌ F7.12: Embedding count doesn't match text count - partial failure
- ❌ F7.13: Embedding contains NaN or invalid values
Location: backend/open_webui/routers/retrieval.py → save_docs_to_vector_db() or save_docs_to_multiple_collections()
Flow:
- Check if collection exists
- Delete collection if overwrite=True
- Prepare items (id, text, vector, metadata)
- Insert items into vector DB
- Return success
Potential Failure Points:
- ❌ F8.1: Vector DB connection fails - database unavailable, connection pool exhausted
- ❌ F8.2: Collection check fails - query timeout, connection error
- ❌ F8.3: Collection deletion fails - transaction error, constraint violation
- ❌ F8.4: Item preparation fails - UUID generation error, metadata serialization error
- ❌ F8.5: Metadata contains invalid types (datetime, dict, list) - conversion fails
- ❌ F8.6: Vector dimension mismatch - collection expects different dimension
- ❌ F8.7: Vector DB insert fails - constraint violation, disk full, transaction error
- ❌ F8.8: Partial insert - some items inserted, others failed
- ❌ F8.9: Insert succeeds but transaction not committed
- ❌ F8.10: Collection created but items not inserted
- ❌ F8.11: Duplicate hash check fails - query error, false positive
- ❌ F8.12: Multiple collections insert - one succeeds, others fail
Location: backend/open_webui/utils/job_queue.py → enqueue_file_processing_job()
Flow:
- Validate inputs (file_id, etc.)
- Get job queue
- Check if job already exists
- Enqueue job with RQ
- Return job_id
Potential Failure Points:
- ❌ F9.1: Input validation fails - file_id empty, non-serializable arguments
- ❌ F9.2: Job queue unavailable - Redis down, ENABLE_JOB_QUEUE=False
- ❌ F9.3: Redis connection fails - network error, authentication error
- ❌ F9.4: Job ID collision - job with same ID exists
- ❌ F9.5: Job enqueue fails - Redis error, serialization error, queue full
- ❌ F9.6: Job enqueued but job_id not returned
- ❌ F9.7: Race condition - job created between check and enqueue
- ❌ F9.8: Embedding API key not passed to job - will fail in worker
- ❌ F9.9: Job arguments too large - Redis memory limit
Location: backend/open_webui/workers/file_processor.py → process_file_job()
Flow:
- Initialize MockRequest with embedding_api_key
- Initialize AppConfig
- Initialize embedding functions
- Get user object
- Update file status to "processing"
- Extract content (same as Step 4 in non-Redis)
- Generate embeddings and save to vector DB
- Update status to "completed"
Potential Failure Points:
- ❌ F10.1: MockRequest initialization fails - config error, database unavailable
- ❌ F10.2: AppConfig initialization fails - database error, config corruption
- ❌ F10.3: Embedding function initialization fails - model not found, API key invalid
- ❌ F10.4: Embedding API key is None - not passed from enqueue, config error
- ❌ F10.5: User retrieval fails - user deleted, database error
- ❌ F10.6: Status update fails - database error, connection lost
- ❌ F10.7: All failure points from Step 4 (F4.1-F4.22) apply here
- ❌ F10.8: Worker process killed - OOM, timeout, system restart
- ❌ F10.9: Worker loses Redis connection during processing
- ❌ F10.10: Job timeout exceeded - large file processing takes too long
- ❌ F10.11: Worker crashes - unhandled exception, segmentation fault
- ❌ F10.12: Job retry fails - all retries exhausted
- ❌ F10.13: Worker pod restarts - job lost if not persisted
- ❌ F11.1: RAG_EMBEDDING_ENGINE not configured or invalid
- ❌ F11.2: RAG_EMBEDDING_MODEL not configured or invalid
- ❌ F11.3: RAG_OPENAI_API_KEY not configured for admin
- ❌ F11.4: RAG_OPENAI_API_BASE_URL not configured or invalid
- ❌ F11.5: UserScopedConfig not initialized properly
- ❌ F11.6: Config changes during processing - race condition
- ❌ F11.7: Config stored in database but not loaded
- ❌ F11.8: Config cache stale - old values used
- ❌ F12.1: Database connection lost during processing
- ❌ F12.2: Database transaction timeout
- ❌ F12.3: Database deadlock
- ❌ F12.4: Database constraint violation
- ❌ F12.5: Database disk full
- ❌ F12.6: Database connection pool exhausted
- ❌ F12.7: Database migration not applied
- ❌ F12.8: Database schema mismatch
- ❌ F13.1: Vector DB connection lost
- ❌ F13.2: Vector DB collection not created
- ❌ F13.3: Vector DB dimension mismatch
- ❌ F13.4: Vector DB disk full
- ❌ F13.5: Vector DB index corruption
- ❌ F13.6: Vector DB transaction error
- ❌ F13.7: Vector DB connection pool exhausted
- ❌ F14.1: Redis connection lost during job processing
- ❌ F14.2: Redis memory full - job queue full
- ❌ F14.3: Redis timeout - job processing exceeds timeout
- ❌ F14.4: Redis failover - job lost if not persisted
- ❌ F14.5: Redis authentication error
- ❌ F14.6: Redis cluster partition - job stuck
- ❌ F15.1: Admin API key not set in Settings > Documents
- ❌ F15.2: User not in group - no API key inheritance
- ❌ F15.3: Admin API key expired or revoked
- ❌ F15.4: User deleted during processing
- ❌ F15.5: User role changed during processing
- ❌ F15.6: Group membership changed - API key access revoked
- ❌ F16.1: File deleted from disk during processing
- ❌ F16.2: File moved or renamed
- ❌ F16.3: Disk full - cannot save file
- ❌ F16.4: File permissions changed
- ❌ F16.5: Storage backend unavailable (S3, etc.)
- ❌ F16.6: File corrupted after upload
- ❌ F17.1: Embedding API unavailable
- ❌ F17.2: Embedding API rate limit exceeded
- ❌ F17.3: Embedding API authentication failed
- ❌ F17.4: Embedding API timeout
- ❌ F17.5: Network partition during API call
- ❌ F17.6: DNS resolution failure
- ❌ F17.7: SSL/TLS certificate error
- ❌ F18.1: Multiple workers process same file
- ❌ F18.2: File deleted while being processed
- ❌ F18.3: Status update race condition
- ❌ F18.4: Lock not released - file stuck
- ❌ F18.5: Job enqueued multiple times
- ❌ F18.6: Background task added multiple times
- ❌ F19.1: Exception caught but status not updated
- ❌ F19.2: Error message not logged
- ❌ F19.3: Partial failure not detected
- ❌ F19.4: Error status not persisted
- ❌ F19.5: Exception swallowed silently
- F3.12 / F7.1: No API key configured - processing will always fail
- F4.15 / F10.4: API key not available in background/worker context
- F4.6-F4.10: Content extraction failures - no content to embed
- F7.5-F7.8: Embedding API failures - cannot generate embeddings
- F8.7-F8.10: Vector DB insert failures - embeddings not saved
- F10.8-F10.12: Worker failures - job lost or not completed
- F15.1-F15.3: Admin API key configuration issues
- Configuration not loaded - API keys, URLs, models
- Database connection lost - during critical operations
- Partial failures - some steps succeed, others fail
- Race conditions - concurrent processing of same file
- Error handling gaps - exceptions not caught or not logged
- Context loss - user/API key not available in background tasks
- Check logs for API key errors - Look for "No embedding API key configured"
- Verify API key is passed to workers - Check job enqueue includes embedding_api_key
- Check content extraction logs - Look for "[Content Extraction WARNING]"
- Verify embedding API calls - Check for rate limits, timeouts, auth errors
- Check vector DB connection - Verify connection pool, transactions
- Monitor worker logs - Check for worker crashes, timeouts
- Verify Redis connection - Check for connection errors, timeouts
- Check file status updates - Verify status transitions correctly
- Monitor database connections - Check for connection pool exhaustion
- Verify error handling - Check that exceptions are caught and logged
- Review each failure point against actual logs
- Add monitoring/alerting for critical failure points
- Improve error handling and logging
- Add retry logic for transient failures
- Add validation at each step
- Improve status tracking and reporting
- Add health checks for dependencies
- Implement circuit breakers for external APIs