This project implements a machine learning pipeline to classify cancer cell samples as either malignant or benign. It leverages both a baseline approach and two advanced optimization methods—Genetic Algorithm (GA) and Particle Swarm Optimization (PSO)—for feature selection, demonstrating how optimization improves classifier performance. The dataset contains several hundred human cell samples characterized by key features.
- Baseline Classification: Implements standard classification techniques without optimization.
- Feature Selection via GA and PSO: Reduces dimensionality and enhances accuracy.
- Performance Comparison: Evaluates the results before and after optimization.
The dataset consists of records of human cell samples, with each record containing the following features:
- Clump Thickness
- Uniformity of Cell Size
- Uniformity of Cell Shape
- Marginal Adhesion
- Single Epithelial Cell Size
- Bare Nuclei
- Bland Chromatin
- Normal Nucleoli
- Mitoses
The target variable indicates whether the sample is malignant or benign.
- Data Preprocessing: The dataset is cleaned, normalized, and split into training and testing sets.
- Baseline Model: A classifier is trained on the full feature set without optimization.
- Optimization Methods:
- Genetic Algorithm: Simulates natural selection to identify the most relevant features.
- Particle Swarm Optimization: Mimics social behavior of swarms to find optimal feature subsets.
- Evaluation: Models are evaluated on metrics such as accuracy, precision, recall, and F1 score.
The project demonstrates:
- Baseline Performance: The initial classifier achieved an accuracy of approximately 85.6% with all features.
- Genetic Algorithm Optimization: After feature selection via GA, accuracy improved to 91.2%.
- Particle Swarm Optimization: PSO further refined the feature selection, achieving a final accuracy of 93.4%.
| Method | Accuracy | Precision | Recall | F1 Score |
|---|---|---|---|---|
| Baseline (No Opt.) | 85.6% | 86.0% | 84.5% | 85.2% |
| Genetic Algorithm | 91.2% | 91.5% | 90.0% | 90.7% |
| Particle Swarm Opt. | 93.4% | 93.8% | 92.5% | 93.1% |
This comparison highlights the significant improvements in performance achieved by applying GA and PSO.
- Programming Language: Python
- Libraries:
- Scikit-learn
- Numpy
- Pandas
- Matplotlib
- Optimization Libraries for GA and PSO
- Clone the repository:
git clone https://github.com/your-username/cancer-cell-classifier.git
- Navigate to the project directory:
cd cancer-cell-classifier - Install dependencies:
pip install -r requirements.txt
- Open the Jupyter notebook:
jupyter notebook Cell_Classifier_Enhanced_With_PSO.ipynb
- Run the cells sequentially to:
- Preprocess the dataset.
- Train the baseline model.
- Apply optimization methods.
- Compare the results.
Contributions are welcome! If you have ideas for improving the implementation or extending the project, feel free to fork the repository and submit a pull request.
This project is licensed under the MIT License.
- The dataset used in this project.
- Libraries and tools that made this project possible.
Feel free to reach out with questions or suggestions!