Skip to content

A Python project that uses a binary simulated annealing algorithm for feature selection, evaluated with an SVM on the Heart Failure Prediction dataset.

License

Notifications You must be signed in to change notification settings

TheMn/feature-selection-simulated-annealing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Feature Selection using Simulated Annealing

This project demonstrates a feature selection technique using a binary simulated annealing algorithm. The goal is to identify the most relevant features in the "Heart Failure Prediction" dataset to predict heart disease effectively. The project uses a Support Vector Machine (SVM) as the classifier to evaluate the selected features.

Project Structure

  • Notebook/: This directory contains the core Python source code and the Jupyter notebook for experimentation.
    • FS.py: Contains functions for feature selection and evaluation.
    • SA.py: Implements the binary simulated annealing algorithm.
    • testing.ipynb: A Jupyter notebook that demonstrates the workflow of the project, from data loading and preprocessing to feature selection and evaluation.
  • requirements.txt: A list of Python dependencies required to run the project.
  • README.md: This file, providing an overview and instructions.

Setup

Prerequisites

  • Python 3.x
  • pip (Python package installer)

Dependencies

To run this project, you need to install the required Python libraries. You can install them using pip and the requirements.txt file:

pip install -r requirements.txt

Dataset

This project uses the "Heart Failure Prediction" dataset, which should be named heart.csv and placed in the root directory of the project. The dataset is not included in this repository, but it can be obtained from sources like Kaggle.

Usage

The primary workflow is demonstrated in the Notebook/testing.ipynb Jupyter notebook. To run the project, you can follow these steps:

  1. Launch Jupyter Notebook:

    jupyter notebook
  2. Open and run the notebook:

    • Navigate to Notebook/testing.ipynb.
    • Run the cells in the notebook to see the feature selection process in action.

The notebook will:

  • Load and preprocess the heart.csv dataset.
  • Use the binary_simulated_annealing function from SA.py to find the best subset of features.
  • Evaluate the performance of the selected features using an SVM classifier.
  • Print the best feature combination and the corresponding accuracy score.

How It Works

The project employs a binary simulated annealing algorithm to explore the feature space. Each feature is represented by a bit in a binary array (solution), where 1 means the feature is selected, and 0 means it is not. The algorithm iteratively generates new candidate solutions by flipping bits and evaluates them based on the performance of an SVM classifier. The goal is to find the combination of features that minimizes the classifier's error rate (1 - accuracy).

About

A Python project that uses a binary simulated annealing algorithm for feature selection, evaluated with an SVM on the Heart Failure Prediction dataset.

Topics

Resources

License

Stars

Watchers

Forks

Contributors 2

  •  
  •