A machine learning model to predict whether a breast tumor is malignant or benign using clinical data.
This project implements a Breast Cancer Classification Model using the Wisconsin Breast Cancer Dataset from Kaggle. It uses Logistic Regression to classify tumors as either Benign (B) or Malignant (M) based on 30 features extracted from digitized images of fine needle aspirates (FNA) of breast masses.
β
Automatic dataset download via Kaggle API
β
Data preprocessing and cleaning
β
Label encoding of diagnosis column
β
Train-test split and feature scaling
β
Logistic Regression classification
β
Model accuracy evaluation
| File | Description |
|---|---|
breast_cancer_prediction.ipynb |
Main Jupyter Notebook with full code and analysis |
README.md |
This documentation file |
git clone https://github.com/haiderabbas678/Breast_Cancer_Detection_Model.git
cd Breast_Cancer_Detection_Modelpip install pandas numpy scikit-learn jupyter matplotlib seaborn kaggleos.environ['KAGGLE_USERNAME'] = "your_username"
os.environ['KAGGLE_KEY'] = "your_api_key"!kaggle datasets download -d uciml/breast-cancer-wisconsin-data
!unzip breast-cancer-wisconsin-data.zipName: Breast Cancer Wisconsin (Diagnostic) Dataset
Source: Kaggle
Description: This dataset contains 569 samples of breast cancer cell nuclei with 32 features extracted from digitized images.
Target Variable: diagnosis
M = Malignant (Cancerous) β 212 cases
B = Benign (Non-Cancerous) β 357 cases
Radius, Texture, Perimeter, Area, Smoothness
Compactness, Concavity, Concave Points
Symmetry, Fractal Dimension
(All measured as mean, standard error, and worst values)
Removed missing/unneeded columns: Unnamed: 32
Encoded diagnosis labels: M β 1, B β 0
Scaled features using StandardScaler
Split data: 75% training, 25% testing
Model Used: Logistic Regression
Evaluation Metrics:
Accuracy Score: 97.9%
Confusion Matrix:
[[52 3]
[ 2 90]]Out of 147 test samples:
52 Benign correctly predicted
90 Malignant correctly predicted
Only 3 Benign misclassified as Malignant
Only 2 Malignant misclassified as Benign
This high accuracy demonstrates that the model is effective at distinguishing between benign and malignant tumors based on the provided clinical features.