Skip to content

MeghanaCVarghese/Heart-Disease-Prediction-Decision-Tree

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

15 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🌳 Decision Tree Classifier – Heart Disease Prediction

Python Machine Learning Decision Tree Status Jupyter Scikit-Learn GitHub repo size

πŸ“Œ Project Overview

This project implements a Decision Tree Classifier to predict the presence of heart disease based on patient medical data. It covers the complete machine learning workflow including EDA, preprocessing, model building, evaluation, and hyperparameter tuning.


🎯 Objective

  • Apply Decision Tree Classification on a real-world dataset
  • Perform data preprocessing and feature engineering
  • Evaluate model performance using multiple metrics
  • Optimize model using hyperparameter tuning
  • Interpret results using visualization and feature importance

πŸ“‚ Dataset

  • File: heart_disease.xlsx
  • Contains features such as:
    • Age, Sex
    • Chest pain type (cp)
    • Resting ECG (restecg)
    • Cholesterol, Blood Pressure
    • Maximum heart rate
    • Target variable (num) β†’ Heart disease presence

πŸ” Exploratory Data Analysis (EDA)

  • Checked dataset structure using .info() and .describe()
  • Identified missing values and duplicates
  • Detected outliers using:
    • Boxplots
    • IQR (Interquartile Range) method
  • Visualizations performed:
    • Histograms for feature distribution
    • Pairplot for relationships
    • Correlation heatmap
    • Target class distribution

βš™οΈ Data Preprocessing & Feature Engineering

Outlier Handling

  • Removed outliers using IQR method
  • Applied capping (winsorization) to limit extreme values

Encoding Techniques

  • Label Encoding for categorical variables
  • One-Hot Encoding using pd.get_dummies()

Feature Scaling

  • Applied Standard Scaling using StandardScaler

Target Transformation

  • Converted target variable into binary:
    • 0 β†’ No Disease
    • 1 β†’ Disease

πŸ€– Model Building – Decision Tree

  • Split dataset into:
    • 80% Training
    • 20% Testing
  • Trained using DecisionTreeClassifier

πŸ“Š Model Evaluation

Performance evaluated using:

  • Accuracy
  • Precision
  • Recall
  • F1-Score
  • Confusion Matrix
  • ROC-AUC Score

Visualizations:

  • ROC Curve
  • Decision Tree Structure

πŸ”§ Hyperparameter Tuning

Used GridSearchCV to optimize:

  • criterion (gini / entropy)
  • max_depth
  • min_samples_split
  • min_samples_leaf

βœ” Improved model performance after tuning


🌳 Model Interpretation

  • Visualized decision tree using plot_tree()
  • Extracted feature importance to identify key predictors

πŸ’‘ Key Learnings

  • Understanding how Decision Trees split data using Gini and Entropy
  • Importance of hyperparameter tuning to avoid overfitting
  • Handling outliers improves model stability
  • Difference between Label Encoding and One-Hot Encoding
  • Model interpretability using tree visualization and feature importance

πŸ“Š Visualizations

πŸ”Ή Feature Distribution

Histogram

πŸ”Ή Outlier Detection

Boxplot

πŸ”Ή Correlation Heatmap

Heatmap

πŸ”Ή Decision Tree Structure

Decision Tree

πŸ”Ή ROC Curve

ROC Curve


🧠 Interview Questions

1. What are common hyperparameters of Decision Trees?

  • max_depth β†’ Controls tree depth (overfitting vs underfitting)
  • min_samples_split β†’ Minimum samples to split a node
  • min_samples_leaf β†’ Minimum samples in a leaf node
  • criterion β†’ Split quality (gini / entropy)

πŸ‘‰ Proper tuning balances bias and variance


2. Label Encoding vs One-Hot Encoding

Feature Label Encoding One-Hot Encoding
Output Integer values Binary columns
Use Case Ordinal data Nominal data
Risk Introduces false order High dimensionality

πŸ› οΈ Tech Stack

  • Python
  • Pandas, NumPy
  • Matplotlib, Seaborn
  • Scikit-learn

πŸš€ How to Run

bash git clone https://github.com/MeghanaCVarghese/Decision-Tree-Classifier

pip install -r requirements.txt

Open Jupyter Notebook

jupyter notebook


πŸ‘©β€πŸ’» Author

Meghana C Varghese

Releases

No releases published

Packages

 
 
 

Contributors