Skip to content

GrahamPellegrini/Feature-Analysis-Multi-Label-Classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Practical Machine Learning – Feature Analysis & Multi-Label Classification

CCE3503 Course scikit-learn PyTorch

Assignments for CCE3503 – Practical Machine Learning, including feature selection techniques and multi-label neural network classification, supervised by Dr. Trevor Spiteri at the University of Malta.


Repository Contents

.
├── assignment1/
│   ├── feature_analysis.ipynb       # Notebook for feature selection and evaluation
│   └── ...                          # CSV dataset assumed to be downloaded via VLE
│
├── assignment2/
│   ├── multi-label-classification.ipynb   # Full pipeline for multi-label classification
│   └── ...                                # Yeast dataset (not included)
└── README.md

Assignment 1 – Feature Analysis (Communities & Crime Dataset)

Objective

Perform data cleaning and dimensionality reduction using:

  • Filter methods (correlation heatmap + thresholding)
  • Wrapper methods: Sequential Forward/Backward Selection (SFS, SBS)
  • Projection: Principal Component Analysis (PCA)

Each technique was evaluated by training an MLPRegressor on a target variable: assaultPerPop.

Highlights

  • Missing data imputed using mean strategy.
  • Normalisation applied using StandardScaler.
  • Train/test split: 80/20

Evaluation

MSE (Mean Squared Error) was used to compare:

  • Filter
  • SFS
  • SBS
  • PCA

Final results showed that PCA and SFS achieved lowest MSE, with PCA reducing feature count significantly.


Assignment 2 – Multi-Label Classification (Yeast Dataset)

Objective

Train and evaluate neural networks for multi-label classification using:

  • Problem transformation (Binary Relevance vs Classifier Chains)
  • Algorithm adaptation with hyperparameter optimisation (HPO)

Dataset

  • Yeast gene expression dataset (2417 samples, 103 features, 14 classes)
  • Preprocessing and label binarisation performed within notebook

Approaches

  • Binary Relevance (BR): independent classifiers per label
  • Classifier Chains (CC): models label dependencies
  • Adapted Neural Network: trained with OneVsRestClassifier and MLPClassifier using HPO

Hyperparameters Tuned

  • Hidden layer sizes
  • Learning rate
  • Batch size

HPO performed using GridSearchCV with 5-fold cross-validation.

Evaluation Metrics

  • Hamming Loss
  • Exact Match Ratio (EMR)
  • F1 Score (Macro and Micro)

Observations

  • Classifier Chains outperformed Binary Relevance in Exact Match Ratio.
  • Adapted neural network showed best balance across all metrics.
  • EMR values were expectedly low due to high label cardinality.

How to Run

  1. Clone the repository:
git clone https://github.com/GrahamPellegrini/cce3503.git
  1. Launch Jupyter Lab or VSCode and open the notebooks:
jupyter lab assignment1/feature_analysis.ipynb
jupyter lab assignment2/multi-label-classification.ipynb
  1. Download the datasets via VLE and place them in the correct folders.

Communities & Crime CSV for Assignment 1
Yeast Dataset for Assignment 2

  1. Run all notebook cells sequentially. Make sure all dependencies are installed:
pip install numpy pandas matplotlib seaborn scikit-learn scikit-multilearn

Author

Graham Pellegrini
B.Eng. (Hons) Computer Engineering
University of Malta
GitHub: @GrahamPellegrini

About

Feature selection and multi-label classification using neural networks (CCE3503 coursework)

Topics

Resources

Stars

Watchers

Forks