This guide helps you resolve common issues when working with the LearnDL sentiment classification system.
Symptoms:
- Connection refused errors
- Timeout errors
- 500 Internal Server Error
Solutions:
-
Check if API is running:
curl http://localhost:8000/model_api/health_check
-
Start the API:
# Using Docker docker-compose up --build # Or locally python api/main.py
-
Check logs:
# Docker logs docker-compose logs api # Local logs # Check terminal output
Symptoms:
- Training starts but fails immediately
- CUDA out of memory errors
- Data loading errors
Common Causes:
RuntimeError: CUDA out of memory
Solutions:
- Reduce batch size:
"batch_size": 8 - Use smaller model:
"embed_model": "distilbert_model" - Freeze more layers:
"fine_tune_mode": "freeze_all" - Reduce hidden neurons:
"hidden_neurons": 128
FileNotFoundError: data/data.csv
Solutions:
- Ensure
data/data.csvexists - Check file path in configuration
- Verify file permissions
ValueError: Expected 2 columns, got 3
Solutions:
- Check CSV format:
input,output - Ensure no extra commas in text
- Use proper CSV quoting
Symptoms:
- Model not found errors
- Configuration mismatch errors
- Low accuracy predictions
HTTP 404: Model not found for this user/session
Solutions:
- Verify
user_idandtraining_session_id - Check if training completed successfully
- Ensure model was saved to Redis
ValueError: Configuration mismatch
Solutions:
- Use same configuration for training and prediction
- Ensure
total_configmatches training setup - Check embed model compatibility
ModuleNotFoundError: No module named 'transformers'
Check Python version:
python --version # Should be 3.11+Reinstall dependencies:
pip install -r requirements.txtAssertionError: Torch not compiled with CUDA enabled
Check GPU availability:
import torch
print(torch.cuda.is_available())
print(torch.cuda.device_count())Solutions:
- Install CUDA-compatible PyTorch
- Use CPU-only version:
pip install torch --index-url https://download.pytorch.org/whl/cpu - Set
CUDA_VISIBLE_DEVICES=""for CPU-only
UnicodeDecodeError: 'utf-8' codec can't decode
Solutions:
- Save CSV with UTF-8 encoding
- Handle special characters in text
- Use
encoding='utf-8'when reading
Warning: Class distribution is heavily imbalanced
Solutions:
- Check class distribution in data
- Use stratified sampling:
"stratify": true - Collect more balanced data
- Use class weights in training
ValueError: Found 0 samples in training data
Solutions:
- Verify CSV has data rows
- Check for empty strings in input column
- Ensure output labels are consistent
Training loss not decreasing
Validation accuracy stuck at ~50%
Solutions:
- Increase learning rate gradually
- Try different embedding models
- Unfreeze more layers for fine-tuning
- Check data quality and preprocessing
Training accuracy: 95%, Validation accuracy: 60%
Solutions:
- Increase dropout:
"dropout": 0.5 - Reduce model complexity
- Add regularization
- Use early stopping
Both training and validation accuracy low
Solutions:
- Increase training epochs
- Unfreeze more layers
- Use larger embedding model
- Check learning rate (might be too low)
redis.ConnectionError: Connection refused
Solutions:
- Start Redis server:
redis-server
- Check Redis configuration in
.env - Verify Redis is running on correct port
KeyError: Model not found in Redis
Solutions:
- Check Redis connection
- Verify model was saved after training
- Check Redis memory usage
- Restart Redis if needed
ERROR: Couldn't connect to Docker daemon
Solutions:
- Start Docker Desktop
- Check Docker is running:
docker info - Restart Docker service
ERROR: Port 8000 is already in use
Solutions:
- Kill process using port:
# Find process lsof -i :8000 # Kill process kill -9 <PID>
- Change port in docker-compose.yml
ERROR: Invalid volume specification
Solutions:
- Use absolute paths for volume mounts
- Check file permissions
- Ensure directories exist
Epoch taking >30 minutes
Solutions:
- Use GPU if available
- Reduce batch size
- Use DistilBERT instead of BERT
- Freeze embedding layers
MemoryError: Unable to allocate array
Solutions:
- Reduce batch size
- Use smaller models
- Process data in chunks
- Add swap memory
Prediction taking >5 seconds
Solutions:
- Use smaller models
- Cache models in memory
- Optimize preprocessing
- Use batch inference for multiple texts
Set environment variable:
export LOG_LEVEL=DEBUGimport psutil
import torch
# CPU usage
print(f"CPU: {psutil.cpu_percent()}%")
# Memory usage
memory = psutil.virtual_memory()
print(f"Memory: {memory.percent}% used")
# GPU usage (if available)
if torch.cuda.is_available():
print(f"GPU Memory: {torch.cuda.memory_allocated()/torch.cuda.max_memory_allocated():.2%}")import cProfile
import pstats
# Profile training
cProfile.run('train_model()', 'profile_stats')
pstats.Stats('profile_stats').sort_stats('cumulative').print_stats(10)Check these locations for logs:
- Docker:
docker-compose logs - Local: Terminal output
- Application:
logs/app.log(if configured)
| Error Code | Meaning | Action |
|---|---|---|
| 400 | Bad Request | Check request parameters |
| 404 | Not Found | Verify user_id/training_session_id |
| 422 | Validation Error | Check configuration values |
| 503 | Service Unavailable | Check model/data availability |
For additional help:
- Check the API Reference
- Review Configuration Guide
- Run the demo notebook
- Check GitHub issues for similar problems
When reporting issues, include:
- Python version:
python --version - OS and version
- Docker version (if used)
- GPU/CPU information
- Full error traceback
- Configuration used
- Sample data (anonymized)