Skip to content

haiderabbas678/Breast_Cancer_Detection_Model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

16 Commits
Β 
Β 
Β 
Β 

Repository files navigation

Breast_Cancer_Detection_Model

🧠 Breast Cancer Detection Model

License Python Jupyter Notebook

A machine learning model to predict whether a breast tumor is malignant or benign using clinical data.

This project implements a Breast Cancer Classification Model using the Wisconsin Breast Cancer Dataset from Kaggle. It uses Logistic Regression to classify tumors as either Benign (B) or Malignant (M) based on 30 features extracted from digitized images of fine needle aspirates (FNA) of breast masses.


πŸ” Features

βœ… Automatic dataset download via Kaggle API
βœ… Data preprocessing and cleaning
βœ… Label encoding of diagnosis column
βœ… Train-test split and feature scaling
βœ… Logistic Regression classification
βœ… Model accuracy evaluation


πŸ“ Files Included

File Description
breast_cancer_prediction.ipynb Main Jupyter Notebook with full code and analysis
README.md This documentation file

πŸš€ Getting Started

1. Clone the Repository

git clone https://github.com/haiderabbas678/Breast_Cancer_Detection_Model.git 
cd Breast_Cancer_Detection_Model

2. Install Dependencies

pip install pandas numpy scikit-learn jupyter matplotlib seaborn kaggle
os.environ['KAGGLE_USERNAME'] = "your_username"
os.environ['KAGGLE_KEY'] = "your_api_key"
!kaggle datasets download -d uciml/breast-cancer-wisconsin-data
!unzip breast-cancer-wisconsin-data.zip

πŸ“Š Data Set

Name: Breast Cancer Wisconsin (Diagnostic) Dataset
Source: Kaggle
Description: This dataset contains 569 samples of breast cancer cell nuclei with 32 features extracted from digitized images.
Target Variable: diagnosis
M = Malignant (Cancerous) – 212 cases
B = Benign (Non-Cancerous) – 357 cases

🧾 Key Features:

Radius, Texture, Perimeter, Area, Smoothness
Compactness, Concavity, Concave Points
Symmetry, Fractal Dimension
(All measured as mean, standard error, and worst values)

🧹 Data Preprocessing:

Removed missing/unneeded columns: Unnamed: 32
Encoded diagnosis labels: M β†’ 1, B β†’ 0
Scaled features using StandardScaler
Split data: 75% training, 25% testing

πŸ§ͺ Model Performance

Model Used: Logistic Regression
Evaluation Metrics:
Accuracy Score: 97.9%
Confusion Matrix:

[[52  3]
 [ 2 90]]

βœ… Results Summary:

Out of 147 test samples:
52 Benign correctly predicted
90 Malignant correctly predicted
Only 3 Benign misclassified as Malignant
Only 2 Malignant misclassified as Benign
This high accuracy demonstrates that the model is effective at distinguishing between benign and malignant tumors based on the provided clinical features.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published