This project demonstrates a complete MLOps pipeline for laptop price prediction using MLflow, FastAPI, and CI/CD. It covers the full machine learning lifecycle from data processing to model deployment.
Source: Kaggle - Laptop Price Prediction Dataset
Features: 13 features including Company, Type, Screen Size, RAM, Storage, CPU, GPU, and more Size: 1,274 laptop records Target: Price prediction (regression task)
The dataset is already included in mlops_version/data/laptop_data_combined.csv and has been preprocessed for optimal model performance.
laptop_price/
├── README.md # This file - complete project documentation
├── .gitignore # Git ignore rules
├── requirements.txt # Python dependencies
├── mlops_version/ # MLOps pipeline (production-ready)
│ ├── data/ # Data storage
│ ├── models/ # Trained model artifacts
│ ├── mlruns/ # MLflow experiment tracking
│ ├── validation_reports/ # Model validation reports
│ ├── validation_plots/ # Validation visualizations
│ ├── tests/ # Unit tests
│ ├── .github/workflows/ # CI/CD pipeline
│ ├── config.py # Configuration settings
│ ├── data_processing.py # Data processing pipeline
│ ├── model_training.py # Model training with MLflow
│ ├── model_validation.py # Model validation and evaluation
│ ├── fastapi_app.py # FastAPI web service
│ ├── train_pipeline.py # Complete training pipeline
│ ├── Dockerfile # Docker configuration
│ ├── docker-compose.yml # Docker orchestration
│ └── requirements.txt # MLOps dependencies
└── test_scripts/ # Quick testing and learning
├── laptop_price_prediction.py # Standalone training script
├── inference.py # Inference script
├── model.pkl # Trained model
├── ct.pkl # Preprocessor
├── df.pkl # Reference data
└── data_splits.pkl # Train/val/test splits
- Data Processing: Automated data cleaning and feature engineering
- Model Training: Multiple ML algorithms with hyperparameter tuning
- Experiment Tracking: MLflow integration for logging experiments
- Data Validation: Pandera schema validation
- Model Evaluation: Comprehensive metrics (R2, RMSE, MAE, MAPE)
- Cross-Validation: K-fold cross-validation
- Validation Reports: Automated report generation
- Visualization: Prediction vs actual plots
- Model Artifacts: Serialized models and preprocessors
- MLflow Registry: Model versioning and metadata
- Artifact Storage: Organized model storage
- FastAPI Service: RESTful API for predictions
- Docker Support: Containerized deployment
- Health Checks: API monitoring endpoints
- Batch Processing: Multiple prediction support
- Clone the repository
git clone https://github.com/drakegeo/laptop-price-prediction-mlops.git
cd laptop-price-prediction-mlops- Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies
pip install -r ./mpols_version/requirements.txt- Data is already included
The dataset is already included in
mlops_version/data/laptop_data_combined.csv
cd mlops_version
python train_pipeline.pyThis will:
- Process and validate the data
- Train multiple ML models
- Perform hyperparameter tuning
- Validate model performance
- Save model artifacts
- Log everything to MLflow
cd test_scripts
python laptop_price_prediction.pyThis will:
- Train a model quickly
- Save model files locally
- Test inference functionality
cd mlops_version
python fastapi_app.pyThe API will be available at http://localhost:8000
The test_scripts/ folder contains simplified scripts for quick testing and learning:
laptop_price_prediction.py: Standalone training script with all featuresinference.py: Script for making predictions with trained models- Generated files:
model.pkl,ct.pkl,df.pkl,data_splits.pkl
- Learning: Understanding the ML pipeline step by step
- Quick Testing: Fast model training without MLOps overhead
- Development: Testing new features before MLOps integration
- Portfolio: Demonstrating ML skills without complex infrastructure
- Run
python laptop_price_prediction.pyto train and save models - Run
python inference.pyto test predictions - Use the generated
.pklfiles for inference in other projects
# Health check
curl http://localhost:8000/health
# Get model info
curl http://localhost:8000/model_info
# Make a prediction
curl -X POST "http://localhost:8000/predict" \
-H "Content-Type: application/json" \
-d '{
"Company": "Apple",
"TypeName": "Ultrabook",
"Inches": 13.3,
"Ram": 8,
"OpSys": "Mac",
"Weight": 1.37,
"Touchscreen": 0,
"Ips": 1,
"ppi": 226.0,
"cpu_Brand": "Intel Core i5",
"HDD": 0,
"SSD": 128,
"Gpu_Brand": "Intel"
}'# Start MLflow UI
mlflow ui
# Open browser to http://localhost:5000- Experiment Tracking: All training runs logged
- Model Registry: Versioned model storage
- Artifact Storage: Model files and reports
- Metrics Logging: Training and validation metrics
- Parameter Tracking: Hyperparameters and configuration
# Run all tests
pytest
# Run with coverage
pytest --cov=. --cov-report=html
# Run specific test file
pytest tests/test_data_processing.py- Data processing pipeline
- Model training functionality
- API endpoints
- Data validation
The project includes a GitHub Actions workflow (.github/workflows/ci_cd.yml) that:
- Testing: Runs unit tests, linting, and type checking
- Training: Trains models on main branch pushes
- Validation: Validates model performance
- Packaging: Creates model artifacts
- Deployment: Deploys to staging and production
- Push to
mainbranch - Pull requests to
mainbranch - Manual workflow dispatch
The pipeline trains and evaluates multiple models:
- Linear Regression: Baseline model
- Random Forest: Ensemble method
- Gradient Boosting: Sequential learning
- XGBoost: Optimized gradient boosting
- SVR: Support Vector Regression
- K-Nearest Neighbors: Instance-based learning
- R2 Score: 0.85+ (85%+ accuracy)
- RMSE: < 0.3 (log scale)
- MAPE: < 25% (mean absolute percentage error)
Edit config.py to customize:
- Data paths
- Model parameters
- API settings
- MLflow configuration
Once the API is running, visit:
- Swagger UI:
http://localhost:8000/docs - ReDoc:
http://localhost:8000/redoc
GET /- API informationGET /health- Health checkGET /model_info- Model informationPOST /predict- Single predictionPOST /batch_predict- Batch predictions
# Build image
docker build -t laptop-price-api .
# Run container
docker run -p 8000:8000 laptop-price-api- Use environment variables for configuration
- Implement proper logging and monitoring
- Set up load balancing for high availability
- Configure auto-scaling based on demand
- Implement model monitoring and drift detection
- MLflow: Experiment tracking and model registry
- FastAPI: Built-in API documentation and health checks
- Logging: Structured logging throughout the pipeline
- Metrics: Performance metrics and validation reports
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Run the test suite
- Submit a pull request
This project is licensed under the MIT License.
The same structure can be extended to cloud platforms such as:
- Azure ML: For model training and deployment
- Google Cloud AI Platform: For MLOps orchestration
- AWS SageMaker: For end-to-end ML pipeline
- Kubernetes: For container orchestration and scaling