This project implements a modular deep learning training pipeline for Multi-Layer Perceptrons (MLPs) using PyTorch. It explores optimization dynamics, weight decay regularization, and evaluation methods. The goal is to demonstrate best practices in model building, training, and analysis.
-
Model Architecture
- Flexible MLP builder with configurable depth and hidden units
- Modular
nn.Moduledesign for reuse
-
Training Pipeline
- Custom training loop with
SGDandCrossEntropyLoss - Epoch-level loss tracking and optimizer updates
- Custom training loop with
-
Regularization
- Implementation of weight decay applied only to weight parameters
- Comparison of regularized vs non-regularized training
-
Evaluation
- Accuracy calculation on validation/test sets
- Parameter extraction utilities for model introspection
-
Experimentation
- Easy hyperparameter tuning: learning rate, layers, hidden size, weight decay
- PCA integration for analyzing parameter/feature spaces
- Python
- Jupyter Notebook
- PyTorch (
torch,torchvision) - scikit-learn
- numpy, matplotlib
-
MLP Construction
- Function
build_mlp()creates feed-forward models with arbitrary layers
- Function
-
Training
train_epoch()handles forward pass, backpropagation, and optimizer updates
-
Weight Decay
sgd_weight_decay_weights_only()implements selective weight decay to prevent bias/variance terms from being penalized
-
Evaluation
evaluate()computes classification accuracyextract_model_params()flattens all trainable parameters for analysis
-
Experimentation
- Torchvision datasets (e.g., MNIST/Fashion-MNIST) used as examples
- PCA via
sklearn.decompositionfor dimensionality reduction in analysis
- Understand MLP construction and training loops in PyTorch
- Apply regularization techniques like weight decay correctly
- Practice writing reusable training utilities for deep learning
- Explore model parameter analysis with PCA and evaluation metrics
- Add support for other optimizers (Adam, RMSprop)
- Implement early stopping and learning rate scheduling
- Extend evaluation with confusion matrices and precision/recall
- Add visualization of training dynamics (loss/accuracy curves)
This project uses datasets from torchvision (e.g., MNIST, Fashion-MNIST).
They are automatically downloaded via the PyTorch datasets API.