Laptop Price Prediction - MLOps Pipeline

This project demonstrates a complete MLOps pipeline for laptop price prediction using MLflow, FastAPI, and CI/CD. It covers the full machine learning lifecycle from data processing to model deployment.

📊 Dataset

Source: Kaggle - Laptop Price Prediction Dataset

Features: 13 features including Company, Type, Screen Size, RAM, Storage, CPU, GPU, and more Size: 1,274 laptop records Target: Price prediction (regression task)

The dataset is already included in mlops_version/data/laptop_data_combined.csv and has been preprocessed for optimal model performance.

📁 Project Structure

laptop_price/
├── README.md                    # This file - complete project documentation
├── .gitignore                   # Git ignore rules
├── requirements.txt             # Python dependencies
├── mlops_version/               # MLOps pipeline (production-ready)
│   ├── data/                    # Data storage
│   ├── models/                  # Trained model artifacts
│   ├── mlruns/                  # MLflow experiment tracking
│   ├── validation_reports/      # Model validation reports
│   ├── validation_plots/        # Validation visualizations
│   ├── tests/                   # Unit tests
│   ├── .github/workflows/       # CI/CD pipeline
│   ├── config.py               # Configuration settings
│   ├── data_processing.py      # Data processing pipeline
│   ├── model_training.py       # Model training with MLflow
│   ├── model_validation.py     # Model validation and evaluation
│   ├── fastapi_app.py          # FastAPI web service
│   ├── train_pipeline.py       # Complete training pipeline
│   ├── Dockerfile              # Docker configuration
│   ├── docker-compose.yml      # Docker orchestration
│   └── requirements.txt        # MLOps dependencies
└── test_scripts/               # Quick testing and learning
    ├── laptop_price_prediction.py  # Standalone training script
    ├── inference.py            # Inference script
    ├── model.pkl              # Trained model
    ├── ct.pkl                 # Preprocessor
    ├── df.pkl                 # Reference data
    └── data_splits.pkl        # Train/val/test splits

🚀 Features

Step 2: Model Development

Data Processing: Automated data cleaning and feature engineering
Model Training: Multiple ML algorithms with hyperparameter tuning
Experiment Tracking: MLflow integration for logging experiments
Data Validation: Pandera schema validation

Step 3: Validation

Model Evaluation: Comprehensive metrics (R2, RMSE, MAE, MAPE)
Cross-Validation: K-fold cross-validation
Validation Reports: Automated report generation
Visualization: Prediction vs actual plots

Step 4: Packaging

Model Artifacts: Serialized models and preprocessors
MLflow Registry: Model versioning and metadata
Artifact Storage: Organized model storage

Step 5: Deployment

FastAPI Service: RESTful API for predictions
Docker Support: Containerized deployment
Health Checks: API monitoring endpoints
Batch Processing: Multiple prediction support

🛠️ Installation

Clone the repository

git clone https://github.com/drakegeo/laptop-price-prediction-mlops.git
cd laptop-price-prediction-mlops

Create virtual environment

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies

pip install -r ./mpols_version/requirements.txt

Data is already included The dataset is already included in mlops_version/data/laptop_data_combined.csv

🏃‍♂️ Quick Start

Option 1: MLOps Pipeline (Full Production)

cd mlops_version
python train_pipeline.py

This will:

Process and validate the data
Train multiple ML models
Perform hyperparameter tuning
Validate model performance
Save model artifacts
Log everything to MLflow

Option 2: Test Scripts (Quick Testing & Learning)

cd test_scripts
python laptop_price_prediction.py

This will:

Train a model quickly
Save model files locally
Test inference functionality

Start the API Service

cd mlops_version
python fastapi_app.py

The API will be available at http://localhost:8000

📚 Test Scripts Folder

The test_scripts/ folder contains simplified scripts for quick testing and learning:

What's Included:

laptop_price_prediction.py: Standalone training script with all features
inference.py: Script for making predictions with trained models
Generated files: model.pkl, ct.pkl, df.pkl, data_splits.pkl

When to Use:

Learning: Understanding the ML pipeline step by step
Quick Testing: Fast model training without MLOps overhead
Development: Testing new features before MLOps integration
Portfolio: Demonstrating ML skills without complex infrastructure

Workflow:

Run python laptop_price_prediction.py to train and save models
Run python inference.py to test predictions
Use the generated .pkl files for inference in other projects

3. Test the API

# Health check
curl http://localhost:8000/health

# Get model info
curl http://localhost:8000/model_info

# Make a prediction
curl -X POST "http://localhost:8000/predict" \
  -H "Content-Type: application/json" \
  -d '{
    "Company": "Apple",
    "TypeName": "Ultrabook",
    "Inches": 13.3,
    "Ram": 8,
    "OpSys": "Mac",
    "Weight": 1.37,
    "Touchscreen": 0,
    "Ips": 1,
    "ppi": 226.0,
    "cpu_Brand": "Intel Core i5",
    "HDD": 0,
    "SSD": 128,
    "Gpu_Brand": "Intel"
  }'

📊 MLflow Integration

View Experiments

# Start MLflow UI
mlflow ui

# Open browser to http://localhost:5000

Key Features

Experiment Tracking: All training runs logged
Model Registry: Versioned model storage
Artifact Storage: Model files and reports
Metrics Logging: Training and validation metrics
Parameter Tracking: Hyperparameters and configuration

🧪 Testing

Run Tests

# Run all tests
pytest

# Run with coverage
pytest --cov=. --cov-report=html

# Run specific test file
pytest tests/test_data_processing.py

Test Coverage

Data processing pipeline
Model training functionality
API endpoints
Data validation

🔄 CI/CD Pipeline

The project includes a GitHub Actions workflow (.github/workflows/ci_cd.yml) that:

Testing: Runs unit tests, linting, and type checking
Training: Trains models on main branch pushes
Validation: Validates model performance
Packaging: Creates model artifacts
Deployment: Deploys to staging and production

Workflow Triggers

Push to main branch
Pull requests to main branch
Manual workflow dispatch

📈 Model Performance

The pipeline trains and evaluates multiple models:

Linear Regression: Baseline model
Random Forest: Ensemble method
Gradient Boosting: Sequential learning
XGBoost: Optimized gradient boosting
SVR: Support Vector Regression
K-Nearest Neighbors: Instance-based learning

Expected Performance

R2 Score: 0.85+ (85%+ accuracy)
RMSE: < 0.3 (log scale)
MAPE: < 25% (mean absolute percentage error)

🔧 Configuration

Edit config.py to customize:

Data paths
Model parameters
API settings
MLflow configuration

📝 API Documentation

Once the API is running, visit:

Swagger UI: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc

API Endpoints

GET / - API information
GET /health - Health check
GET /model_info - Model information
POST /predict - Single prediction
POST /batch_predict - Batch predictions

🚀 Deployment

Docker Deployment

# Build image
docker build -t laptop-price-api .

# Run container
docker run -p 8000:8000 laptop-price-api

Production Considerations

Use environment variables for configuration
Implement proper logging and monitoring
Set up load balancing for high availability
Configure auto-scaling based on demand
Implement model monitoring and drift detection

🔍 Monitoring and Observability

MLflow: Experiment tracking and model registry
FastAPI: Built-in API documentation and health checks
Logging: Structured logging throughout the pipeline
Metrics: Performance metrics and validation reports

🤝 Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests for new functionality
Run the test suite
Submit a pull request

📄 License

This project is licensed under the MIT License.

The same structure can be extended to cloud platforms such as:

Azure ML: For model training and deployment
Google Cloud AI Platform: For MLOps orchestration
AWS SageMaker: For end-to-end ML pipeline
Kubernetes: For container orchestration and scaling

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
mlops_version		mlops_version
test_scripts		test_scripts
.gitignore		.gitignore
README.md		README.md

drakegeo/laptop-price-prediction-mlops

Folders and files

Latest commit

History

Repository files navigation