Skip to content

drakegeo/laptop-price-prediction-mlops

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Laptop Price Prediction - MLOps Pipeline

This project demonstrates a complete MLOps pipeline for laptop price prediction using MLflow, FastAPI, and CI/CD. It covers the full machine learning lifecycle from data processing to model deployment.

📊 Dataset

Source: Kaggle - Laptop Price Prediction Dataset

Features: 13 features including Company, Type, Screen Size, RAM, Storage, CPU, GPU, and more Size: 1,274 laptop records Target: Price prediction (regression task)

The dataset is already included in mlops_version/data/laptop_data_combined.csv and has been preprocessed for optimal model performance.

📁 Project Structure

laptop_price/
├── README.md                    # This file - complete project documentation
├── .gitignore                   # Git ignore rules
├── requirements.txt             # Python dependencies
├── mlops_version/               # MLOps pipeline (production-ready)
│   ├── data/                    # Data storage
│   ├── models/                  # Trained model artifacts
│   ├── mlruns/                  # MLflow experiment tracking
│   ├── validation_reports/      # Model validation reports
│   ├── validation_plots/        # Validation visualizations
│   ├── tests/                   # Unit tests
│   ├── .github/workflows/       # CI/CD pipeline
│   ├── config.py               # Configuration settings
│   ├── data_processing.py      # Data processing pipeline
│   ├── model_training.py       # Model training with MLflow
│   ├── model_validation.py     # Model validation and evaluation
│   ├── fastapi_app.py          # FastAPI web service
│   ├── train_pipeline.py       # Complete training pipeline
│   ├── Dockerfile              # Docker configuration
│   ├── docker-compose.yml      # Docker orchestration
│   └── requirements.txt        # MLOps dependencies
└── test_scripts/               # Quick testing and learning
    ├── laptop_price_prediction.py  # Standalone training script
    ├── inference.py            # Inference script
    ├── model.pkl              # Trained model
    ├── ct.pkl                 # Preprocessor
    ├── df.pkl                 # Reference data
    └── data_splits.pkl        # Train/val/test splits

🚀 Features

Step 2: Model Development

  • Data Processing: Automated data cleaning and feature engineering
  • Model Training: Multiple ML algorithms with hyperparameter tuning
  • Experiment Tracking: MLflow integration for logging experiments
  • Data Validation: Pandera schema validation

Step 3: Validation

  • Model Evaluation: Comprehensive metrics (R2, RMSE, MAE, MAPE)
  • Cross-Validation: K-fold cross-validation
  • Validation Reports: Automated report generation
  • Visualization: Prediction vs actual plots

Step 4: Packaging

  • Model Artifacts: Serialized models and preprocessors
  • MLflow Registry: Model versioning and metadata
  • Artifact Storage: Organized model storage

Step 5: Deployment

  • FastAPI Service: RESTful API for predictions
  • Docker Support: Containerized deployment
  • Health Checks: API monitoring endpoints
  • Batch Processing: Multiple prediction support

🛠️ Installation

  1. Clone the repository
git clone https://github.com/drakegeo/laptop-price-prediction-mlops.git
cd laptop-price-prediction-mlops
  1. Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
  1. Install dependencies
pip install -r ./mpols_version/requirements.txt
  1. Data is already included The dataset is already included in mlops_version/data/laptop_data_combined.csv

🏃‍♂️ Quick Start

Option 1: MLOps Pipeline (Full Production)

cd mlops_version
python train_pipeline.py

This will:

  • Process and validate the data
  • Train multiple ML models
  • Perform hyperparameter tuning
  • Validate model performance
  • Save model artifacts
  • Log everything to MLflow

Option 2: Test Scripts (Quick Testing & Learning)

cd test_scripts
python laptop_price_prediction.py

This will:

  • Train a model quickly
  • Save model files locally
  • Test inference functionality

Start the API Service

cd mlops_version
python fastapi_app.py

The API will be available at http://localhost:8000

📚 Test Scripts Folder

The test_scripts/ folder contains simplified scripts for quick testing and learning:

What's Included:

  • laptop_price_prediction.py: Standalone training script with all features
  • inference.py: Script for making predictions with trained models
  • Generated files: model.pkl, ct.pkl, df.pkl, data_splits.pkl

When to Use:

  • Learning: Understanding the ML pipeline step by step
  • Quick Testing: Fast model training without MLOps overhead
  • Development: Testing new features before MLOps integration
  • Portfolio: Demonstrating ML skills without complex infrastructure

Workflow:

  1. Run python laptop_price_prediction.py to train and save models
  2. Run python inference.py to test predictions
  3. Use the generated .pkl files for inference in other projects

3. Test the API

# Health check
curl http://localhost:8000/health

# Get model info
curl http://localhost:8000/model_info

# Make a prediction
curl -X POST "http://localhost:8000/predict" \
  -H "Content-Type: application/json" \
  -d '{
    "Company": "Apple",
    "TypeName": "Ultrabook",
    "Inches": 13.3,
    "Ram": 8,
    "OpSys": "Mac",
    "Weight": 1.37,
    "Touchscreen": 0,
    "Ips": 1,
    "ppi": 226.0,
    "cpu_Brand": "Intel Core i5",
    "HDD": 0,
    "SSD": 128,
    "Gpu_Brand": "Intel"
  }'

📊 MLflow Integration

View Experiments

# Start MLflow UI
mlflow ui

# Open browser to http://localhost:5000

Key Features

  • Experiment Tracking: All training runs logged
  • Model Registry: Versioned model storage
  • Artifact Storage: Model files and reports
  • Metrics Logging: Training and validation metrics
  • Parameter Tracking: Hyperparameters and configuration

🧪 Testing

Run Tests

# Run all tests
pytest

# Run with coverage
pytest --cov=. --cov-report=html

# Run specific test file
pytest tests/test_data_processing.py

Test Coverage

  • Data processing pipeline
  • Model training functionality
  • API endpoints
  • Data validation

🔄 CI/CD Pipeline

The project includes a GitHub Actions workflow (.github/workflows/ci_cd.yml) that:

  1. Testing: Runs unit tests, linting, and type checking
  2. Training: Trains models on main branch pushes
  3. Validation: Validates model performance
  4. Packaging: Creates model artifacts
  5. Deployment: Deploys to staging and production

Workflow Triggers

  • Push to main branch
  • Pull requests to main branch
  • Manual workflow dispatch

📈 Model Performance

The pipeline trains and evaluates multiple models:

  • Linear Regression: Baseline model
  • Random Forest: Ensemble method
  • Gradient Boosting: Sequential learning
  • XGBoost: Optimized gradient boosting
  • SVR: Support Vector Regression
  • K-Nearest Neighbors: Instance-based learning

Expected Performance

  • R2 Score: 0.85+ (85%+ accuracy)
  • RMSE: < 0.3 (log scale)
  • MAPE: < 25% (mean absolute percentage error)

🔧 Configuration

Edit config.py to customize:

  • Data paths
  • Model parameters
  • API settings
  • MLflow configuration

📝 API Documentation

Once the API is running, visit:

  • Swagger UI: http://localhost:8000/docs
  • ReDoc: http://localhost:8000/redoc

API Endpoints

  • GET / - API information
  • GET /health - Health check
  • GET /model_info - Model information
  • POST /predict - Single prediction
  • POST /batch_predict - Batch predictions

🚀 Deployment

Docker Deployment

# Build image
docker build -t laptop-price-api .

# Run container
docker run -p 8000:8000 laptop-price-api

Production Considerations

  • Use environment variables for configuration
  • Implement proper logging and monitoring
  • Set up load balancing for high availability
  • Configure auto-scaling based on demand
  • Implement model monitoring and drift detection

🔍 Monitoring and Observability

  • MLflow: Experiment tracking and model registry
  • FastAPI: Built-in API documentation and health checks
  • Logging: Structured logging throughout the pipeline
  • Metrics: Performance metrics and validation reports

🤝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests for new functionality
  5. Run the test suite
  6. Submit a pull request

📄 License

This project is licensed under the MIT License.

The same structure can be extended to cloud platforms such as:

  • Azure ML: For model training and deployment
  • Google Cloud AI Platform: For MLOps orchestration
  • AWS SageMaker: For end-to-end ML pipeline
  • Kubernetes: For container orchestration and scaling

About

Complete MLOps pipeline for laptop price prediction with FastAPI, Docker, and MLflow

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published