Vroom Value

Vroom Value is an end-to-end MLOps solution for predicting used car prices in the Indian market. This production-grade implementation combines robust machine learning pipelines with a Flask web interface, powered by ZenML and MLflow for experiment tracking and model management.

Overview

The project predicts the resale price of used cars in India using regression models trained on real-world data sourced from Cardekho. The solution includes automated pipelines using ZenML for data preprocessing, model training, and hyperparameter tuning with all experiments logged in MLflow. A Flask web application provides an interactive interface for users to input car details and receive price estimations instantly.

Features

Accurate Price Predictions: Predicts used car resale values using the best regression model trained on features such as mileage, engine power, kilometers driven, fuel type, and more.
Robust ML Pipelines: Managed with ZenML, covering data engineering, model training, top model selection, and hyperparameter tuning.
Experiment Tracking: MLflow tracks experiments, logs metrics, and manages model versions.
User-Friendly Web Interface: A Flask app with pages for inputting car details and viewing predicted prices in Indian Rupees (₹).
Real-Time Predictions: The best-performing model is deployed and integrated into the Flask app for seamless user interaction.

Architecture and Workflow

This project follows a modular and production-grade machine learning lifecycle, built for scalability, reproducibility, and ease of deployment:

Data Collection

Collected comprehensive data from Cardekho, covering key attributes relevant to the Indian used car resale market.
Data Engineering Pipeline

Managed using ZenML to ensure robust preprocessing, the pipeline includes:
- Automated data ingestion
- Handling of missing and inconsistent values
- Domain-specific feature engineering
- Outlier detection and treatment
- Stratified train-test split for balanced model training
Model Experimentation

Multiple supervised regression algorithms were trained and benchmarked:
- Linear Regression
- Ridge
- Lasso
- K-Nearest Neighbors
- Decision Trees
- Random Forest Regressor
- AdaBoost
- Gradient Boosting Regressor
- Support Vector Regressor (SVR)
Model Selection
- Shortlisted top-k performing models based on R² Score and Mean Squared Error (MSE)
Hyperparameter Optimization
- Conducted exhaustive GridSearchCV on the top-k models
- Tuned key hyperparameters to minimize overfitting and maximize accuracy
Experiment Tracking with MLflow
- Tracked metrics, visualizations, and parameters for every experiment
- Logged all artifacts including models, pipelines, and transformers for easy versioning
Pipeline Orchestration with ZenML
- Enabled clean separation of stages (data engineering, model training, hyperparameter tuning, deployment)
- Designed for reproducibility, scalability, and seamless CI/CD integration
Model Deployment
- Final retrained model served via MLflow model registry
- Integrated into a responsive Flask web app to deliver real-time price predictions based on user input

Pipeline Workflow

Installation & Setup

Prerequisites

Python 3.10+
UV (recommended) or pip
Virtual environment manager (included in instructions)

Initial Setup Using UV (Recommended)

Install UV (if not already installed):

# bash
pipx install uv

# Using curl
curl -LsSf https://astral.sh/uv/install.sh | sh

# Or with pip
python -m pip install uv

Create & activate a virtual environment:

uv venv <virtual-env-name>

Activate Envoirnment

# Linux/macOS
source <virtual-env-name>/bin/activate

# Windows
.\<virtual-env-name>\Scripts\Activate

Install Dependencies:

uv pip install -r requirements.txt

ZenML Setup

# Initialize ZenML
zenml init

# Install MLflow integration
zenml integration install mlflow -y

# Register components (MLflow and Model Deployer)
zenml experiment-tracker register mlflow_tracker --flavor=mlflow
zenml model-deployer register mlflow --flavor=mlflow
zenml stack register local-mlflow-stack -a default -o default -d mlflow -e mlflow_tracker --set

Usage

Run the entire Pipeline

python run_pipelines.py

Start MLflow Dashboard

# on a separate terminal instance
mlflow ui --backend-store-uri <mlflow_tracking_uri>

Can get <mlflow_tracking_uri> using get_tracking_uri()

Access dashboard at: http://localhost:5000 3. Launch Web App

python app.py

Access Web App at: http://localhost:5002

Optional: Access ZenML Dashboard

zenml login --local --blocking

Access dashboard at: http://localhost:8237

Application Screenshots

Home Page : A page with an introduction to the app

Predict Page: Form interface for users to input car details

Result Page: Estimated resale price shown in Indian Rupees

Folder Structure

vroom-value/
├── analysis/            # Notebooks or scripts for exploratory data analysis (EDA)
├── configs/             # YAML/JSON config files for pipelines and parameters
├── data/                # Directory for raw input data
├── extracted_data/      # Cleaned and structured data extracted to CSVs
├── pipelines/           # ZenML pipeline definitions and orchestration logic
├── src/                 # Core machine learning logic and helper modules
├── static/              # Static assets like CSS, images, and JS files
├── steps/               # Custom ZenML steps used in pipelines (e.g., preprocessing, training)
├── templates/           # HTML templates for the Flask frontend
├── utils/               # Shared utility functions across the project
├── app.py               # Entry point for running the Flask web app
├── pyproject.toml       # Project metadata and dependency management (via uv or poetry)
├── requirements.txt     # Explicit list of Python dependencies
├── run_pipeline.py      # Script to trigger ZenML pipeline and launch the model
└── README.md            # Project overview and documentation

Technologies Stack

Programming Language

Machine Learning and MLOps

Data Manipulation

Visualization

Web Framework

Frontend

Package Manager

Future Improvements

Advanced Feature Engineering: Introduce sophisticated feature engineering techniques to improve model accuracy
Exploration of Additional Models: Expand the range of regression models to capture more complex patterns in the data.
Database Integration for Data Ingestion: Enhance data ingestion to support dynamic and scalable data sources.
Enhanced Pipeline Automation: Further streamline the ZenML pipelines for greater efficiency and flexibility.
Cloud Deployment: Transition the application to a cloud-based infrastructure for scalability

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Vroom Value

Table of Contents

Overview

Features

Architecture and Workflow

Pipeline Workflow

Installation & Setup

Prerequisites

Initial Setup Using UV (Recommended)

ZenML Setup

Usage

Application Screenshots

Folder Structure

Technologies Stack

Future Improvements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
analysis		analysis
configs		configs
data		data
extracted_data		extracted_data
pipelines		pipelines
src		src
static		static
steps		steps
templates		templates
utils		utils
.gitattributes		.gitattributes
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
app.py		app.py
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run_deployment.py		run_deployment.py
run_pipeline.py		run_pipeline.py
uv.lock		uv.lock

License

RohitKrish46/vroom-value

Folders and files

Latest commit

History

Repository files navigation

Vroom Value

Table of Contents

Overview

Features

Architecture and Workflow

Pipeline Workflow

Installation & Setup

Prerequisites

Initial Setup Using UV (Recommended)

ZenML Setup

Usage

Application Screenshots

Folder Structure

Technologies Stack

Future Improvements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages