Assignments for CCE3503 – Practical Machine Learning, including feature selection techniques and multi-label neural network classification, supervised by Dr. Trevor Spiteri at the University of Malta.
.
├── assignment1/
│ ├── feature_analysis.ipynb # Notebook for feature selection and evaluation
│ └── ... # CSV dataset assumed to be downloaded via VLE
│
├── assignment2/
│ ├── multi-label-classification.ipynb # Full pipeline for multi-label classification
│ └── ... # Yeast dataset (not included)
└── README.md
Perform data cleaning and dimensionality reduction using:
- Filter methods (correlation heatmap + thresholding)
- Wrapper methods: Sequential Forward/Backward Selection (SFS, SBS)
- Projection: Principal Component Analysis (PCA)
Each technique was evaluated by training an MLPRegressor on a target variable: assaultPerPop.
- Missing data imputed using mean strategy.
- Normalisation applied using
StandardScaler. - Train/test split: 80/20
MSE (Mean Squared Error) was used to compare:
- Filter
- SFS
- SBS
- PCA
Final results showed that PCA and SFS achieved lowest MSE, with PCA reducing feature count significantly.
Train and evaluate neural networks for multi-label classification using:
- Problem transformation (Binary Relevance vs Classifier Chains)
- Algorithm adaptation with hyperparameter optimisation (HPO)
- Yeast gene expression dataset (2417 samples, 103 features, 14 classes)
- Preprocessing and label binarisation performed within notebook
- Binary Relevance (BR): independent classifiers per label
- Classifier Chains (CC): models label dependencies
- Adapted Neural Network: trained with
OneVsRestClassifierandMLPClassifierusing HPO
- Hidden layer sizes
- Learning rate
- Batch size
HPO performed using GridSearchCV with 5-fold cross-validation.
- Hamming Loss
- Exact Match Ratio (EMR)
- F1 Score (Macro and Micro)
- Classifier Chains outperformed Binary Relevance in Exact Match Ratio.
- Adapted neural network showed best balance across all metrics.
- EMR values were expectedly low due to high label cardinality.
- Clone the repository:
git clone https://github.com/GrahamPellegrini/cce3503.git- Launch Jupyter Lab or VSCode and open the notebooks:
jupyter lab assignment1/feature_analysis.ipynb
jupyter lab assignment2/multi-label-classification.ipynb- Download the datasets via VLE and place them in the correct folders.
Communities & Crime CSV for Assignment 1
Yeast Dataset for Assignment 2
- Run all notebook cells sequentially. Make sure all dependencies are installed:
pip install numpy pandas matplotlib seaborn scikit-learn scikit-multilearnGraham Pellegrini
B.Eng. (Hons) Computer Engineering
University of Malta
GitHub: @GrahamPellegrini