Comprehensive Logging Guide for Embedding Pipeline

Overview

This guide explains where to find logs and how to use the enhanced logging system to debug embedding failures in Redis/RQ environment.

Log Locations

1. Main Application Logs (FastAPI/OpenWebUI)

Default Locations:

Local/Docker: stdout/stderr (redirected to console)
Systemd: journalctl -u open-webui -f
Docker: docker logs -f <container-name>
Kubernetes: kubectl logs -f <pod-name>
Custom file: Set LOG_FILE environment variable

How to Access:

# Docker
docker logs -f open-webui

# Kubernetes
kubectl logs -f <pod-name> -c open-webui

# Systemd
journalctl -u open-webui -f

# Direct stdout (if running manually)
# Logs appear in terminal where uvicorn is running

2. Worker Logs (RQ Workers)

Default Locations:

Local: /tmp/rq-worker.log (if redirected) or stdout/stderr
Docker: stdout/stderr (check container logs)
Kubernetes: kubectl logs -f <worker-pod-name>
Systemd: journalctl -u rq-worker -f

How to Access:

# Local file
tail -f /tmp/rq-worker.log

# Docker
docker logs -f <worker-container-name>

# Kubernetes
kubectl logs -f <worker-pod-name>

# Systemd
journalctl -u rq-worker -f

# Direct stdout (if running manually)
python -m open_webui.workers.start_worker
# Logs appear in terminal

Note: Worker logs use both print() and log.info() to ensure visibility. All critical steps are logged with [STEP X] prefixes.

3. Redis Logs

Default Locations:

Local: /var/log/redis/redis-server.log
Docker: docker logs -f <redis-container-name>
Kubernetes: kubectl logs -f <redis-pod-name>
Systemd: journalctl -u redis -f

How to Access:

# Local
tail -f /var/log/redis/redis-server.log

# Docker
docker logs -f redis

# Kubernetes
kubectl logs -f <redis-pod-name>

Unified Log Viewer Script

A script is provided to view all logs together: view_logs.sh

Usage:

# Interactive menu
./view_logs.sh

# View main app logs only
./view_logs.sh main

# View worker logs only
./view_logs.sh worker

# View Redis logs only
./view_logs.sh redis

# View all logs together (interleaved)
./view_logs.sh all

# View main app + worker combined
./view_logs.sh combined

Custom Log Locations:

export MAIN_APP_LOG=/path/to/main.log
export WORKER_LOG=/path/to/worker.log
export REDIS_LOG=/path/to/redis.log
./view_logs.sh

Log Format and Markers

Print Statements (Easy to Read)

All critical steps use print() statements with clear markers:

[PROCESS FILE] Starting file processing request
  [STEP 1] Checking embedding engine: portkey
  [STEP 1.1] Engine is OpenAI/Portkey, retrieving API key...
  [STEP 1.2] API key retrieval result:
    embedding_api_key is None: False
    embedding_api_key length: 51
  [STEP 1.3] ✅ API key retrieved and validated

[EMBEDDING] Starting embedding generation
  [STEP 1] User context:
    user_email: user@example.com
  [STEP 2] Retrieving API key from config...
  [STEP 2.1] API key retrieval result:
    user_api_key is None: False
    user_api_key length: 51
  [STEP 2.2] ✅ API key validated
  [STEP 3] Getting base URL...
  [STEP 4] Validating API key before embedding function creation...
  [STEP 5] Creating embedding function...
  [STEP 6] Generating embeddings for 10 chunks...
  [STEP 6.1] ✅ Embeddings generated successfully
  [STEP 7] Inserting into vector DB...
  [STEP 7.1] ✅ Successfully inserted

Log Levels

ERROR/❌: Critical failures that stop processing
WARNING/⚠️: Issues that may cause problems but don't stop processing
INFO/✅: Successful operations and progress updates
DEBUG: Detailed debugging information

Key Log Markers to Watch For

1. API Key Validation

Success:

[STEP 1.2] API key retrieval result:
  embedding_api_key is None: False
  embedding_api_key length: 51
[STEP 1.3] ✅ API key retrieved and validated

Failure:

[STEP 1.2] API key retrieval result:
  embedding_api_key is None: True
  embedding_api_key length: 0
[STEP 1.3] ❌ CRITICAL BUG: No embedding API key configured...

2. Embedding Function Initialization

Success:

[EMBEDDING INIT] Starting embedding function initialization
  [STEP 1] API key resolution: Using provided key
  [STEP 4] ✅ API key validated
  [STEP 6] ✅ Embedding function created successfully
[EMBEDDING INIT] ✅ Initialization completed successfully

Failure:

[EMBEDDING INIT] Starting embedding function initialization
  [STEP 4] ❌ CRITICAL: No embedding API key provided in job!
[EMBEDDING INIT] ❌ Failed to initialize EMBEDDING_FUNCTION

3. Embedding Generation

Success:

[STEP 6] Generating embeddings for 10 chunks in a single batch
[STEP 6.1] Embedding generation result:
  embeddings is None: False
  embeddings length: 10
  first embedding length: 1536
[STEP 6.1] ✅ Embeddings generated successfully

Failure:

[STEP 6] Generating embeddings for 10 chunks...
[STEP 6] ❌ Failed to generate embeddings: 401 Unauthorized

4. Vector DB Insert

Success:

[STEP 7] Inserting embeddings into 2 collection(s)
  [STEP 7.1] ✅ Successfully inserted into collection: file-123
  [STEP 7.2] ✅ Successfully inserted into collection: knowledge-456
[EMBEDDING] ✅ All embeddings saved successfully

Failure:

[STEP 7.1] Processing collection: file-123
[STEP 7.1] ❌ Failed to insert into collection file-123: Connection error

Debugging Workflow

Step 1: Check Job Enqueueing (Main App)

Look for:

[PROCESS FILE] Starting file processing request
  [STEP 1] Checking embedding engine: portkey
  [STEP 1.2] API key retrieval result:
    embedding_api_key is None: False  ← Should be False
    embedding_api_key length: 51      ← Should be > 0
  [STEP 1.3] ✅ API key retrieved and validated

If API key is None:

Check admin has configured API key in Settings > Documents
Check user is in a group created by that admin
Check UserScopedConfig.get() is working

Step 2: Check Worker Job Processing

Look for:

[JOB START] Processing file job: file_id=...
  INPUT PARAMETERS:
    embedding_api_key=PROVIDED (51 chars, ends with ...xyz)  ← Should be PROVIDED
  [STEP 1.2] Validating embedding API key...
    embedding_api_key is None: False  ← Should be False
  [STEP 1.2] ✅ API key validation passed
  [STEP 1.3] Initializing EMBEDDING_FUNCTION...
  [STEP 1.3] ✅ EMBEDDING_FUNCTION initialized successfully

If API key is None in job:

Check enqueue_file_processing_job() is passing the key
Check job serialization (should include key)

Step 3: Check Embedding Generation

Look for:

[EMBEDDING] Starting embedding generation
  [STEP 2.1] API key retrieval result:
    user_api_key is None: False  ← Should be False
  [STEP 4] Validating API key before embedding function creation...
    api_key_to_use is None: False  ← Should be False
  [STEP 5] Creating embedding function...
  [STEP 6] Generating embeddings for 10 chunks...
  [STEP 6.1] ✅ Embeddings generated successfully

If embedding generation fails:

Check API key is valid (not expired/revoked)
Check base URL is correct
Check embedding API is accessible
Check rate limits

Step 4: Check Vector DB Save

Look for:

[STEP 7] Inserting embeddings into 2 collection(s)
  [STEP 7.1] ✅ Successfully inserted into collection: file-123
  [STEP 7.2] ✅ Successfully inserted into collection: knowledge-456
[EMBEDDING] ✅ All embeddings saved successfully

If vector DB insert fails:

Check vector DB connection
Check collection exists or can be created
Check vector dimensions match
Check disk space

Common Error Patterns

Pattern 1: API Key is None

Symptoms:

embedding_api_key is None: True
embedding_api_key length: 0
❌ CRITICAL BUG: No embedding API key provided...

Causes:

Admin hasn't configured API key
User not in group
UserScopedConfig.get() returning None
API key not passed to job

Fix:

Check admin has API key in Settings > Documents
Check user is in group created by that admin
Check RAG_OPENAI_API_KEY.get(user.email) returns value

Pattern 2: Embedding Function is None

Symptoms:

EMBEDDING_FUNCTION is None: True
❌ CRITICAL BUG: EMBEDDING_FUNCTION is None after initialization

Causes:

API key is None (see Pattern 1)
Base URL is invalid
Embedding model not configured
get_embedding_function() returned None

Fix:

Check API key is not None
Check base URL is valid (not "None" string)
Check embedding model is configured
Check get_embedding_function() implementation

Pattern 3: Embedding Generation Fails

Symptoms:

[STEP 6] Generating embeddings for 10 chunks...
[STEP 6] ❌ Failed to generate embeddings: 401 Unauthorized

Causes:

Invalid API key
API key expired/revoked
Wrong base URL
Rate limit exceeded
Network error

Fix:

Verify API key is valid
Check base URL is correct
Check API rate limits
Check network connectivity

Pattern 4: Vector DB Insert Fails

Symptoms:

[STEP 7.1] ❌ Failed to insert into collection: Connection error

Causes:

Vector DB connection lost
Collection doesn't exist
Vector dimension mismatch
Disk full
Transaction error

Fix:

Check vector DB connection
Check collection exists
Check vector dimensions
Check disk space

Log Search Tips

Find All API Key Issues:

grep -i "api key\|embedding_api_key\|user_api_key" <log-file> | grep -i "none\|empty\|❌"

Find All Embedding Failures:

grep -i "embedding\|❌\|failed" <log-file>

Find All Worker Errors:

grep -i "\[JOB\|\[STEP\|❌" <worker-log-file>

Find Specific File Processing:

grep "file_id=<file-id>" <log-file>

Find All Critical Bugs:

grep "CRITICAL BUG" <log-file>

Real-Time Monitoring

Watch Main App + Worker Logs Together:

# Using the script
./view_logs.sh combined

# Or manually
(tail -f /path/to/main.log | sed 's/^/[MAIN] /' &) && \
(tail -f /path/to/worker.log | sed 's/^/[WORKER] /' &) && \
wait

Watch for Specific Patterns:

# Watch for API key issues
tail -f <log-file> | grep --color=always -i "api key\|embedding_api_key"

# Watch for errors
tail -f <log-file> | grep --color=always -i "❌\|error\|failed"

# Watch for embedding steps
tail -f <log-file> | grep --color=always -i "\[STEP\|\[EMBEDDING"

Kubernetes-Specific Log Access

View All Pods:

kubectl get pods

View Main App Logs:

kubectl logs -f <main-app-pod-name>

View Worker Logs:

kubectl logs -f <worker-pod-name>

View Redis Logs:

kubectl logs -f <redis-pod-name>

View All Logs Together:

# All pods with label app=open-webui
kubectl logs -f -l app=open-webui

# Specific container in pod
kubectl logs -f <pod-name> -c <container-name>

View Logs with Timestamps:

kubectl logs -f <pod-name> --timestamps

View Last N Lines:

kubectl logs --tail=100 <pod-name>

Search Logs:

kubectl logs <pod-name> | grep "CRITICAL BUG"

Docker-Specific Log Access

View Container Logs:

# All logs
docker logs -f <container-name>

# Last 100 lines
docker logs --tail=100 <container-name>

# With timestamps
docker logs -f --timestamps <container-name>

# Since specific time
docker logs --since 10m <container-name>

View Multiple Containers:

# Main app
docker logs -f open-webui &

# Worker
docker logs -f open-webui-worker &

# Redis
docker logs -f redis &

wait

Log File Rotation

If logs get too large, consider setting up log rotation:

Using logrotate:

# /etc/logrotate.d/open-webui
/path/to/logs/*.log {
    daily
    rotate 7
    compress
    delaycompress
    missingok
    notifempty
}

Using Docker:

docker run --log-opt max-size=10m --log-opt max-file=3 ...

Summary

Main App Logs: Check API key retrieval and job enqueueing
Worker Logs: Check job processing and embedding generation
Redis Logs: Check job queue and Redis connectivity
Use view_logs.sh: Unified viewer for all logs
Look for markers: [STEP X], ✅, ❌, [EMBEDDING]
Check API key: Should never be None
Check embedding function: Should never be None
Check errors: Look for ❌ CRITICAL BUG messages

Quick Reference

# View all logs together
./view_logs.sh all

# View worker logs only
./view_logs.sh worker

# View main app logs only  
./view_logs.sh main

# Search for API key issues
grep -i "api key.*none\|❌.*api key" <log-file>

# Search for embedding failures
grep -i "❌\|failed.*embedding" <log-file>

# Follow logs in real-time
tail -f <log-file>

FilesExpand file tree

LOGGING_GUIDE.md

Latest commit

History

LOGGING_GUIDE.md

File metadata and controls

Comprehensive Logging Guide for Embedding Pipeline

Overview

Log Locations

1. Main Application Logs (FastAPI/OpenWebUI)

2. Worker Logs (RQ Workers)

3. Redis Logs

Unified Log Viewer Script

Usage:

Custom Log Locations:

Log Format and Markers

Print Statements (Easy to Read)

Log Levels

Key Log Markers to Watch For

1. API Key Validation

2. Embedding Function Initialization

3. Embedding Generation

4. Vector DB Insert

Debugging Workflow

Step 1: Check Job Enqueueing (Main App)

Step 2: Check Worker Job Processing

Step 3: Check Embedding Generation

Step 4: Check Vector DB Save

Common Error Patterns

Pattern 1: API Key is None

Pattern 2: Embedding Function is None

Pattern 3: Embedding Generation Fails

Pattern 4: Vector DB Insert Fails

Log Search Tips

Find All API Key Issues:

Find All Embedding Failures:

Find All Worker Errors:

Find Specific File Processing:

Find All Critical Bugs:

Real-Time Monitoring

Watch Main App + Worker Logs Together:

Watch for Specific Patterns:

Kubernetes-Specific Log Access

View All Pods:

View Main App Logs:

View Worker Logs:

View Redis Logs:

View All Logs Together:

View Logs with Timestamps:

View Last N Lines:

Search Logs:

Docker-Specific Log Access

View Container Logs:

View Multiple Containers:

Log File Rotation

Using logrotate:

Using Docker:

Summary

Quick Reference