This project demonstrates a feature selection technique using a binary simulated annealing algorithm. The goal is to identify the most relevant features in the "Heart Failure Prediction" dataset to predict heart disease effectively. The project uses a Support Vector Machine (SVM) as the classifier to evaluate the selected features.
Notebook/: This directory contains the core Python source code and the Jupyter notebook for experimentation.FS.py: Contains functions for feature selection and evaluation.SA.py: Implements the binary simulated annealing algorithm.testing.ipynb: A Jupyter notebook that demonstrates the workflow of the project, from data loading and preprocessing to feature selection and evaluation.
requirements.txt: A list of Python dependencies required to run the project.README.md: This file, providing an overview and instructions.
- Python 3.x
- pip (Python package installer)
To run this project, you need to install the required Python libraries. You can install them using pip and the requirements.txt file:
pip install -r requirements.txtThis project uses the "Heart Failure Prediction" dataset, which should be named heart.csv and placed in the root directory of the project. The dataset is not included in this repository, but it can be obtained from sources like Kaggle.
The primary workflow is demonstrated in the Notebook/testing.ipynb Jupyter notebook. To run the project, you can follow these steps:
-
Launch Jupyter Notebook:
jupyter notebook
-
Open and run the notebook:
- Navigate to
Notebook/testing.ipynb. - Run the cells in the notebook to see the feature selection process in action.
- Navigate to
The notebook will:
- Load and preprocess the
heart.csvdataset. - Use the
binary_simulated_annealingfunction fromSA.pyto find the best subset of features. - Evaluate the performance of the selected features using an SVM classifier.
- Print the best feature combination and the corresponding accuracy score.
The project employs a binary simulated annealing algorithm to explore the feature space. Each feature is represented by a bit in a binary array (solution), where 1 means the feature is selected, and 0 means it is not. The algorithm iteratively generates new candidate solutions by flipping bits and evaluates them based on the performance of an SVM classifier. The goal is to find the combination of features that minimizes the classifier's error rate (1 - accuracy).