Practical Machine Learning – Feature Analysis & Multi-Label Classification

Assignments for CCE3503 – Practical Machine Learning, including feature selection techniques and multi-label neural network classification, supervised by Dr. Trevor Spiteri at the University of Malta.

Repository Contents

.
├── assignment1/
│   ├── feature_analysis.ipynb       # Notebook for feature selection and evaluation
│   └── ...                          # CSV dataset assumed to be downloaded via VLE
│
├── assignment2/
│   ├── multi-label-classification.ipynb   # Full pipeline for multi-label classification
│   └── ...                                # Yeast dataset (not included)
└── README.md

Assignment 1 – Feature Analysis (Communities & Crime Dataset)

Objective

Perform data cleaning and dimensionality reduction using:

Filter methods (correlation heatmap + thresholding)
Wrapper methods: Sequential Forward/Backward Selection (SFS, SBS)
Projection: Principal Component Analysis (PCA)

Each technique was evaluated by training an MLPRegressor on a target variable: assaultPerPop.

Highlights

Missing data imputed using mean strategy.
Normalisation applied using StandardScaler.
Train/test split: 80/20

Evaluation

MSE (Mean Squared Error) was used to compare:

Filter
SFS
SBS
PCA

Final results showed that PCA and SFS achieved lowest MSE, with PCA reducing feature count significantly.

Assignment 2 – Multi-Label Classification (Yeast Dataset)

Objective

Train and evaluate neural networks for multi-label classification using:

Problem transformation (Binary Relevance vs Classifier Chains)
Algorithm adaptation with hyperparameter optimisation (HPO)

Dataset

Yeast gene expression dataset (2417 samples, 103 features, 14 classes)
Preprocessing and label binarisation performed within notebook

Approaches

Binary Relevance (BR): independent classifiers per label
Classifier Chains (CC): models label dependencies
Adapted Neural Network: trained with OneVsRestClassifier and MLPClassifier using HPO

Hyperparameters Tuned

Hidden layer sizes
Learning rate
Batch size

HPO performed using GridSearchCV with 5-fold cross-validation.

Evaluation Metrics

Hamming Loss
Exact Match Ratio (EMR)
F1 Score (Macro and Micro)

Observations

Classifier Chains outperformed Binary Relevance in Exact Match Ratio.
Adapted neural network showed best balance across all metrics.
EMR values were expectedly low due to high label cardinality.

How to Run

Clone the repository:

git clone https://github.com/GrahamPellegrini/cce3503.git

Launch Jupyter Lab or VSCode and open the notebooks:

jupyter lab assignment1/feature_analysis.ipynb
jupyter lab assignment2/multi-label-classification.ipynb

Download the datasets via VLE and place them in the correct folders.

Communities & Crime CSV for Assignment 1
Yeast Dataset for Assignment 2

Run all notebook cells sequentially. Make sure all dependencies are installed:

pip install numpy pandas matplotlib seaborn scikit-learn scikit-multilearn

Author

Graham Pellegrini
B.Eng. (Hons) Computer Engineering
University of Malta
GitHub: @GrahamPellegrini

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
Feature_analysis		Feature_analysis
Multi-label_classification		Multi-label_classification
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Practical Machine Learning – Feature Analysis & Multi-Label Classification

Repository Contents

Assignment 1 – Feature Analysis (Communities & Crime Dataset)

Objective

Highlights

Evaluation

Assignment 2 – Multi-Label Classification (Yeast Dataset)

Objective

Dataset

Approaches

Hyperparameters Tuned

Evaluation Metrics

Observations

How to Run

Author

About

Uh oh!

Languages

GrahamPellegrini/Feature-Analysis-Multi-Label-Classification

Folders and files

Latest commit

History

Repository files navigation

Practical Machine Learning – Feature Analysis & Multi-Label Classification

Repository Contents

Assignment 1 – Feature Analysis (Communities & Crime Dataset)

Objective

Highlights

Evaluation

Assignment 2 – Multi-Label Classification (Yeast Dataset)

Objective

Dataset

Approaches

Hyperparameters Tuned

Evaluation Metrics

Observations

How to Run

Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages