Skip to content

End-to-end ML pipeline for predicting used car prices. Built with ZenML to streamline experiment tracking, model deployment, and reproducibility. Features a Flask app for real-time predictions and full MLOps integration with MLflow.

License

Notifications You must be signed in to change notification settings

RohitKrish46/vroom-value

Repository files navigation

Vroom Value

Python 3.10+ MLflow ZenML uv CI/CD License: MIT

Vroom Value is an end-to-end MLOps solution for predicting used car prices in the Indian market. This production-grade implementation combines robust machine learning pipelines with a Flask web interface, powered by ZenML and MLflow for experiment tracking and model management.

Table of Contents

Overview

The project predicts the resale price of used cars in India using regression models trained on real-world data sourced from Cardekho. The solution includes automated pipelines using ZenML for data preprocessing, model training, and hyperparameter tuning with all experiments logged in MLflow. A Flask web application provides an interactive interface for users to input car details and receive price estimations instantly.

Features

  • Accurate Price Predictions: Predicts used car resale values using the best regression model trained on features such as mileage, engine power, kilometers driven, fuel type, and more.

  • Robust ML Pipelines: Managed with ZenML, covering data engineering, model training, top model selection, and hyperparameter tuning.

  • Experiment Tracking: MLflow tracks experiments, logs metrics, and manages model versions.

  • User-Friendly Web Interface: A Flask app with pages for inputting car details and viewing predicted prices in Indian Rupees (₹).

  • Real-Time Predictions: The best-performing model is deployed and integrated into the Flask app for seamless user interaction.

Architecture and Workflow

image

This project follows a modular and production-grade machine learning lifecycle, built for scalability, reproducibility, and ease of deployment:

  1. Data Collection

    Collected comprehensive data from Cardekho, covering key attributes relevant to the Indian used car resale market.

  2. Data Engineering Pipeline

    Managed using ZenML to ensure robust preprocessing, the pipeline includes:

    • Automated data ingestion

    • Handling of missing and inconsistent values

    • Domain-specific feature engineering

    • Outlier detection and treatment

    • Stratified train-test split for balanced model training

  3. Model Experimentation

    Multiple supervised regression algorithms were trained and benchmarked:

    • Linear Regression
    • Ridge
    • Lasso
    • K-Nearest Neighbors
    • Decision Trees
    • Random Forest Regressor
    • AdaBoost
    • Gradient Boosting Regressor
    • Support Vector Regressor (SVR)
  4. Model Selection

    • Shortlisted top-k performing models based on R² Score and Mean Squared Error (MSE)
  5. Hyperparameter Optimization

    • Conducted exhaustive GridSearchCV on the top-k models

    • Tuned key hyperparameters to minimize overfitting and maximize accuracy

  6. Experiment Tracking with MLflow

    • Tracked metrics, visualizations, and parameters for every experiment

    • Logged all artifacts including models, pipelines, and transformers for easy versioning

  7. Pipeline Orchestration with ZenML

    • Enabled clean separation of stages (data engineering, model training, hyperparameter tuning, deployment)

    • Designed for reproducibility, scalability, and seamless CI/CD integration

  8. Model Deployment

    • Final retrained model served via MLflow model registry

    • Integrated into a responsive Flask web app to deliver real-time price predictions based on user input

Pipeline Workflow

image

Installation & Setup

Prerequisites

  • Python 3.10+
  • UV (recommended) or pip
  • Virtual environment manager (included in instructions)

Initial Setup Using UV (Recommended)

  1. Install UV (if not already installed):
# bash
pipx install uv

# Using curl
curl -LsSf https://astral.sh/uv/install.sh | sh

# Or with pip
python -m pip install uv
  1. Create & activate a virtual environment:
uv venv <virtual-env-name>
  1. Activate Envoirnment
# Linux/macOS
source <virtual-env-name>/bin/activate

# Windows
.\<virtual-env-name>\Scripts\Activate
  1. Install Dependencies:
uv pip install -r requirements.txt

ZenML Setup

# Initialize ZenML
zenml init

# Install MLflow integration
zenml integration install mlflow -y

# Register components (MLflow and Model Deployer)
zenml experiment-tracker register mlflow_tracker --flavor=mlflow
zenml model-deployer register mlflow --flavor=mlflow
zenml stack register local-mlflow-stack -a default -o default -d mlflow -e mlflow_tracker --set

Usage

  1. Run the entire Pipeline
python run_pipelines.py
  1. Start MLflow Dashboard
# on a separate terminal instance
mlflow ui --backend-store-uri <mlflow_tracking_uri>

Can get <mlflow_tracking_uri> using get_tracking_uri()

Access dashboard at: http://localhost:5000 3. Launch Web App

python app.py

Access Web App at: http://localhost:5002

  1. Optional: Access ZenML Dashboard
zenml login --local --blocking

Access dashboard at: http://localhost:8237

Application Screenshots

Home Page : A page with an introduction to the app

CarPredict _home

Predict Page: Form interface for users to input car details

CarPredict _predict

Result Page: Estimated resale price shown in Indian Rupees

CarPredict _result

Folder Structure

vroom-value/
├── analysis/            # Notebooks or scripts for exploratory data analysis (EDA)
├── configs/             # YAML/JSON config files for pipelines and parameters
├── data/                # Directory for raw input data
├── extracted_data/      # Cleaned and structured data extracted to CSVs
├── pipelines/           # ZenML pipeline definitions and orchestration logic
├── src/                 # Core machine learning logic and helper modules
├── static/              # Static assets like CSS, images, and JS files
├── steps/               # Custom ZenML steps used in pipelines (e.g., preprocessing, training)
├── templates/           # HTML templates for the Flask frontend
├── utils/               # Shared utility functions across the project
├── app.py               # Entry point for running the Flask web app
├── pyproject.toml       # Project metadata and dependency management (via uv or poetry)
├── requirements.txt     # Explicit list of Python dependencies
├── run_pipeline.py      # Script to trigger ZenML pipeline and launch the model
└── README.md            # Project overview and documentation

Technologies Stack

Programming Language

Python

Machine Learning and MLOps

Scikit-learn MLflow ZenML Category Encoders

Data Manipulation

Pandas NumPy

Visualization

Seaborn Matplotlib

Web Framework

Flask

Frontend

HTML5 CSS3

Package Manager

uv

Future Improvements

  • Advanced Feature Engineering: Introduce sophisticated feature engineering techniques to improve model accuracy

  • Exploration of Additional Models: Expand the range of regression models to capture more complex patterns in the data.

  • Database Integration for Data Ingestion: Enhance data ingestion to support dynamic and scalable data sources.

  • Enhanced Pipeline Automation: Further streamline the ZenML pipelines for greater efficiency and flexibility.

  • Cloud Deployment: Transition the application to a cloud-based infrastructure for scalability

About

End-to-end ML pipeline for predicting used car prices. Built with ZenML to streamline experiment tracking, model deployment, and reproducibility. Features a Flask app for real-time predictions and full MLOps integration with MLflow.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published