This repository contains research investigating the relationship between optimizer choice and energy consumption in neural network training. The study examines performance and environmental impact across multiple datasets and optimization algorithms.
Modern machine learning training consumes significant computational resources, raising concerns about environmental impact. This research systematically evaluates how different optimization algorithms affect both model performance and energy consumption through controlled experiments.
Key Question: How do different optimizers balance training performance with energy efficiency?
Title: An Analysis of Optimizer Choice on Energy Efficiency and Performance in Neural Network Training
Author: Tom Almog, University of Waterloo
Abstract: As machine learning models grow increasingly complex and computationally demanding, understanding the environmental impact of training decisions becomes critical for sustainable AI development. This paper presents an empirical study investigating the relationship between optimizer choice and energy efficiency in neural network training through 360 controlled experiments across three benchmark datasets using eight popular optimizers with robust statistical validation.
- Tom Almog. An Analysis of Optimizer Choice on Energy Efficiency and Performance in Neural Network Training.
- Submitted to Sustainable Computing: Informatics and Systems (Manuscript ID: SUSCOM-D-25-02415), 2025.
- Preprint available on arXiv.
Our analysis of 360 experiments reveals several important insights:
- AdamW consistently efficient: Best balance of performance and low energy consumption across datasets
- Dataset complexity matters: Simple vs complex tasks show different optimizer efficiency patterns
- SGD excels on complex tasks: Achieves 20.61% accuracy on CIFAR-100 vs <10% for most others, but at higher energy cost
- Direct measurement essential: Training time and energy consumption are not perfectly correlated
- Statistical significance confirmed: Differences between optimizers are statistically significant (p < 0.01)
- MNIST: 60,000 handwritten digits (421,642 model parameters)
- CIFAR-10: 50,000 natural images, 10 classes (3,249,994 parameters)
- CIFAR-100: 50,000 natural images, 100 classes (3,296,164 parameters)
SGD, Adam, AdamW, RMSprop, Adagrad, Adadelta, Adamax, NAdam
- 360 total experiments: 3 datasets × 8 optimizers × 15 random seeds
- Robust statistics: 15 seeds per configuration for statistical validation
- Comprehensive metrics: accuracy, training time, CO2 emissions, memory usage
This structure is subject to change.
optimizer-energy-study/
├── README.md
├── LICENSE
├── requirements.txt
├── paper/ # LaTeX manuscript
│ ├── optimizer_energy_efficiency.tex
│ ├── optimizer_energy_efficiency.pdf
│ └── references.bib
├── src/ # Experiment code
│ └── experiment_runner.py
├── data/ # Raw experimental data
│ └── experimental_data/
│ ├── comprehensive_results.csv
│ ├── epoch_details.csv
│ └── emissions/
└── results/ # Analysis outputs
└── plots/
├── accuracy_vs_emissions.png
├── training_duration_boxplots.png
├── emissions_rate_heatmap.png
└── statistical_significance.png
| Dataset | Highest Accuracy | Most Efficient | Fastest |
|---|---|---|---|
| MNIST | Adadelta (98.29%) | NAdam | AdamW |
| CIFAR-10 | AdamaxV (66.53%) | AdamW | Adadelta |
| CIFAR-100 | SGD (20.61%) | AdamW | NAdam |
- AdamW: Consistently high efficiency across problem types
- NAdam: Excellent for simpler tasks
- SGD: High performance on complex problems but energy-intensive
- Tool: CodeCarbon 3.0.4 with macOS powermetrics
- Metrics: CPU/GPU power, CO2 emissions, memory usage
- Carbon intensity: Ontario, Canada grid factor
If you use this work, please cite:
@article{almog2025optimizer,
title={An Analysis of Optimizer Choice on Energy Efficiency and Performance in Neural Network Training},
author={Almog, Tom},
institution={University of Waterloo},
year={2025}
}Contributions welcome:
- Bug reports and fixes
- Extension to additional optimizers or datasets
- Replication studies on different hardware
- Methodology improvements
Tom Almog
University of Waterloo
talmog@uwaterloo.ca
MIT License - see LICENSE file for details.
- University of Waterloo for computational resources
- Open-source ML community for development tools
- CodeCarbon team for energy measurement framework
Based on experimental evidence:
- Default choice: Use AdamW when environmental impact matters
- Research settings: SGD may justify higher emissions for challenging datasets
- Simple tasks: Prioritize efficiency over minor accuracy differences
- Complex tasks: Weigh performance gains against environmental costs
See paper for detailed analysis and recommendations.