Vroom Value is an end-to-end MLOps solution for predicting used car prices in the Indian market. This production-grade implementation combines robust machine learning pipelines with a Flask web interface, powered by ZenML and MLflow for experiment tracking and model management.
- Overview
- Features
- Architecture and Workflow
- Installation & Setup
- Usage
- Application Screenshots
- Folder Structure
- Technologies Stack
- Future Improvements
The project predicts the resale price of used cars in India using regression models trained on real-world data sourced from Cardekho. The solution includes automated pipelines using ZenML for data preprocessing, model training, and hyperparameter tuning with all experiments logged in MLflow. A Flask web application provides an interactive interface for users to input car details and receive price estimations instantly.
-
Accurate Price Predictions: Predicts used car resale values using the best regression model trained on features such as mileage, engine power, kilometers driven, fuel type, and more.
-
Robust ML Pipelines: Managed with ZenML, covering data engineering, model training, top model selection, and hyperparameter tuning.
-
Experiment Tracking: MLflow tracks experiments, logs metrics, and manages model versions.
-
User-Friendly Web Interface: A Flask app with pages for inputting car details and viewing predicted prices in Indian Rupees (₹).
-
Real-Time Predictions: The best-performing model is deployed and integrated into the Flask app for seamless user interaction.
This project follows a modular and production-grade machine learning lifecycle, built for scalability, reproducibility, and ease of deployment:
-
Data Collection
Collected comprehensive data from Cardekho, covering key attributes relevant to the Indian used car resale market.
-
Data Engineering Pipeline
Managed using ZenML to ensure robust preprocessing, the pipeline includes:
-
Automated data ingestion
-
Handling of missing and inconsistent values
-
Domain-specific feature engineering
-
Outlier detection and treatment
-
Stratified train-test split for balanced model training
-
-
Model Experimentation
Multiple supervised regression algorithms were trained and benchmarked:
- Linear Regression
- Ridge
- Lasso
- K-Nearest Neighbors
- Decision Trees
- Random Forest Regressor
- AdaBoost
- Gradient Boosting Regressor
- Support Vector Regressor (SVR)
-
Model Selection
- Shortlisted top-k performing models based on R² Score and Mean Squared Error (MSE)
-
Hyperparameter Optimization
-
Conducted exhaustive GridSearchCV on the top-k models
-
Tuned key hyperparameters to minimize overfitting and maximize accuracy
-
-
Experiment Tracking with MLflow
-
Tracked metrics, visualizations, and parameters for every experiment
-
Logged all artifacts including models, pipelines, and transformers for easy versioning
-
-
Pipeline Orchestration with ZenML
-
Enabled clean separation of stages (data engineering, model training, hyperparameter tuning, deployment)
-
Designed for reproducibility, scalability, and seamless CI/CD integration
-
-
Model Deployment
-
Final retrained model served via MLflow model registry
-
Integrated into a responsive Flask web app to deliver real-time price predictions based on user input
-
- Python 3.10+
- UV (recommended) or pip
- Virtual environment manager (included in instructions)
- Install UV (if not already installed):
# bash
pipx install uv
# Using curl
curl -LsSf https://astral.sh/uv/install.sh | sh
# Or with pip
python -m pip install uv
- Create & activate a virtual environment:
uv venv <virtual-env-name>
- Activate Envoirnment
# Linux/macOS
source <virtual-env-name>/bin/activate
# Windows
.\<virtual-env-name>\Scripts\Activate
- Install Dependencies:
uv pip install -r requirements.txt
# Initialize ZenML
zenml init
# Install MLflow integration
zenml integration install mlflow -y
# Register components (MLflow and Model Deployer)
zenml experiment-tracker register mlflow_tracker --flavor=mlflow
zenml model-deployer register mlflow --flavor=mlflow
zenml stack register local-mlflow-stack -a default -o default -d mlflow -e mlflow_tracker --set
- Run the entire Pipeline
python run_pipelines.py
- Start MLflow Dashboard
# on a separate terminal instance
mlflow ui --backend-store-uri <mlflow_tracking_uri>
Can get <mlflow_tracking_uri> using get_tracking_uri()
Access dashboard at: http://localhost:5000
3. Launch Web App
python app.py
Access Web App at: http://localhost:5002
- Optional: Access ZenML Dashboard
zenml login --local --blocking
Access dashboard at: http://localhost:8237
Home Page : A page with an introduction to the app
Predict Page: Form interface for users to input car details
Result Page: Estimated resale price shown in Indian Rupees
vroom-value/
├── analysis/ # Notebooks or scripts for exploratory data analysis (EDA)
├── configs/ # YAML/JSON config files for pipelines and parameters
├── data/ # Directory for raw input data
├── extracted_data/ # Cleaned and structured data extracted to CSVs
├── pipelines/ # ZenML pipeline definitions and orchestration logic
├── src/ # Core machine learning logic and helper modules
├── static/ # Static assets like CSS, images, and JS files
├── steps/ # Custom ZenML steps used in pipelines (e.g., preprocessing, training)
├── templates/ # HTML templates for the Flask frontend
├── utils/ # Shared utility functions across the project
├── app.py # Entry point for running the Flask web app
├── pyproject.toml # Project metadata and dependency management (via uv or poetry)
├── requirements.txt # Explicit list of Python dependencies
├── run_pipeline.py # Script to trigger ZenML pipeline and launch the model
└── README.md # Project overview and documentation
Programming Language
Machine Learning and MLOps
Data Manipulation
Visualization
Web Framework
Frontend
Package Manager
-
Advanced Feature Engineering: Introduce sophisticated feature engineering techniques to improve model accuracy
-
Exploration of Additional Models: Expand the range of regression models to capture more complex patterns in the data.
-
Database Integration for Data Ingestion: Enhance data ingestion to support dynamic and scalable data sources.
-
Enhanced Pipeline Automation: Further streamline the ZenML pipelines for greater efficiency and flexibility.
-
Cloud Deployment: Transition the application to a cloud-based infrastructure for scalability