- Project Overview
- Objectives & Goals
- Installation & Setup
- How to Run the Project
- Methodology
- Results & Key Findings
- User Guides
- API / CLI Documentation
- Potential Next Steps
- Individual Contributions
This project explores how much energy different AI workloads consume and how that relates to broader economic activity and corporate climate reporting. Using Anthropic’s ML.ENERGY benchmark as a starting point, we:
- Build a cleaned dataset of per-run energy measurements for LLM and diffusion workloads on A100 and H100 GPUs.
- Train a first-pass model that predicts whether a workload is Low / Medium / High energy intensity based on configuration and runtime features.
- Set up data pipelines for the Anthropic Economic Index and corporate environmental reports (Google and Microsoft), so that workload-level energy can eventually be connected to economic tasks and organizational emissions.
The repo currently contains:
- A working modeling pipeline for ML.ENERGY (
notebooks/llm-energy-output-modeling.ipynb). - Initial ingestion and data-quality workflows for:
- Anthropic Economic Index.
- Corporate carbon disclosure PDFs.
-
Ingest and standardize key datasets
- ML workload–level energy data (Anthropic ML.ENERGY snapshot).
- Task-level economic activity (Anthropic Economic Index).
- Organization-level climate disclosure PDFs (Google & Microsoft 2024).
-
Build a baseline energy-intensity model
- Predict a 3-class energy label (Low / Medium / High) from workload configuration and runtime metrics.
- Measure how much predictive signal there is without using model names or GPU labels.
-
Characterize drivers of energy use
- Understand which features (batch size, latency, frames, steps, parallelism, throughput, etc.) are most associated with higher energy consumption.
- Compare behavior across NVIDIA A100 vs H100 GPUs.
-
Lay the groundwork for cross-scale linkage
- Prepare data and scripts so future work can connect:
- Workload-level energy → economic tasks → corporate emissions and reporting.
- Prepare data and scripts so future work can connect:
- Python: 3.9 or newer.
- Git (for pulling the ML.ENERGY submodule).
- Optional but recommended:
condaorvenvfor virtual environments.jupyterorjupyterlabfor running notebooks.
git clone --recurse-submodules https://github.com/<your-org-or-username>/KPMG-1D-repo.git
cd KPMG-1D-repoIf you already cloned without
--recurse-submodules, run:git submodule update --init --recursive
This fetches the external/mlenergy submodule (ML.ENERGY leaderboard repo).
Using venv:
python -m venv .venv
source .venv/bin/activate # On macOS/Linux
# .venv\Scripts\activate # On Windows (PowerShell or CMD)Using conda (alternative):
conda create -n kpmg-1d python=3.10
conda activate kpmg-1dThere is no requirements.txt yet, but you can install the required packages with:
pip install \
numpy \
pandas \
matplotlib \
seaborn \
scikit-learn \
datasets \
jupyterIf you want, you can also create your own
requirements.txt:
numpy
pandas
matplotlib
seaborn
scikit-learn
datasets
jupyterthen install via:
pip install -r requirements.txtAll data is stored under the data/ directory, created by the scripts below.
This copies structured ML.ENERGY data from the external/mlenergy submodule into data/mlenergy/raw/:
python scripts/export_mlenergy_snapshot.py- Input:
external/mlenergy/data(from the submodule). - Output:
- JSON/CSV/JSONL files in
data/mlenergy/raw/. - A provenance file
data/mlenergy/README.mdwith the submodule commit hash.
- JSON/CSV/JSONL files in
This downloads the Anthropic/EconomicIndex dataset from Hugging Face and stores it as CSV + Parquet:
python scripts/fetch_anthropic_econ_index.py- Output directory:
data/anthropic_econ_index/processed/anthropic_econ_index.train.parquet/.csvanthropic_econ_index.validation.parquet/.csvanthropic_econ_index.test.parquet/.csv(if present)
This downloads selected corporate environmental reports (Google and Microsoft) to the repo:
python scripts/fetch_corporate_carbon.py- Output directory:
data/corporate_carbon/raw/google_2024_environmental_report.pdfmicrosoft_2024_environmental_sustainability_report.pdfmicrosoft_2024_env_data_fact_sheet.pdf
These PDFs are not yet parsed into structured tables in this repo, but are staged for future use.
The main modeling pipeline lives in:
notebooks/llm-energy-output-modeling.ipynb
Step-by-step:
-
Make sure your environment is activated and dependencies installed.
-
Launch Jupyter:
jupyter lab
or
jupyter notebook
-
In the Jupyter UI, open:
notebooks/llm-energy-output-modeling.ipynb
-
Run the notebook top to bottom:
- The notebook:
- Parses ML.ENERGY data into a flat
pandasDataFrame (mlenergy_df). - Engineers a unified
Energy (J)target and log-scaled version. - Creates a 3-class label:
Low,Medium,High. - Trains a logistic regression classifier.
- Reports accuracy and confusion matrix.
- Parses ML.ENERGY data into a flat
- The notebook:
The model will be trained in-memory (no model checkpoint is written to disk by default, but you can add that if desired).
The evaluation is integrated into the same notebook:
-
Train–test split:
from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.3, random_state=42, stratify=y )
-
Model training and prediction:
from sklearn.linear_model import LogisticRegression from sklearn.metrics import accuracy_score, confusion_matrix model = LogisticRegression( max_iter=1000, solver='liblinear', class_weight='balanced' ) model.fit(X_train, y_train) pred = model.predict(X_test)
-
Metrics and confusion matrix (already computed and displayed in the notebook):
acc = accuracy_score(y_test, pred) cm = confusion_matrix(y_test, pred, labels=['Low', 'Medium', 'High'])
You can re-run these cells to reproduce the ~81.5% test accuracy and see the confusion matrix.
The Economic Index analysis lives in:
notebooks/anthropic-preprocessing-json-file-combining-in-d.ipynbnotebooks/anthropic-preprocessing-json-file-combining-in-d (1).ipynbnotebooks/anthropic-preprocessing-json-file-combining-in-d (2).ipynbnotebooks/data_quality.ipynb
Typical workflow:
-
Open
notebooks/data_quality.ipynb. -
Run the cells to:
- Load the processed Economic Index CSV/Parquet files (from
data/anthropic_econ_index/processed/). - Inspect basic dataset properties:
df.shape,df.nunique(), etc. - Generate frequency plots of task / interaction types.
- Load the processed Economic Index CSV/Parquet files (from
-
Open one of the
anthropic-preprocessing-*.ipynbnotebooks for more advanced EDA or baseline modeling (if present).
ML.ENERGY benchmark (snapshot)
- Source:
external/mlenergy/submodule →data/mlenergy/raw/. - Contains per-run metrics for:
- Diffusion / image-to-video models.
- LLM text-generation workloads.
- GPUs:
- NVIDIA A100-SXM4-40GB
- NVIDIA H100 80GB HBM3
- Key columns (after parsing in the notebook):
Model,GPUEnergy/video (J),Energy/image (J),Energy/req (J)Batch latency (s),Batch size,Denoising steps,FramesTP,PPAvg TPOT (s),Token tput (tok/s)Avg Output Tokens,Avg BS (reqs),Max BS (reqs)
Anthropic Economic Index
- Fetched from Hugging Face with
datasets.load_dataset("Anthropic/EconomicIndex"). - Stored as CSV + Parquet under
data/anthropic_econ_index/processed/. - Used for:
- Data-quality checks.
- Frequency tables of interaction/task categories.
- Potential baseline models (depending on the notebook).
Corporate Carbon Disclosure PDFs
- Downloaded via
scripts/fetch_corporate_carbon.py. - Stored under
data/corporate_carbon/raw/. - Currently not parsed into structured tables in this repo.
In llm-energy-output-modeling.ipynb:
- Parse ML.ENERGY data into a
pandasDataFrame (mlenergy_df). - Handle missing values:
mlenergy_df = mlenergy_df.fillna(0)
- Define a unified energy target:
mlenergy_df['Energy (J)'] = mlenergy_df[ ['Energy/video (J)', 'Energy/image (J)', 'Energy/req (J)'] ].max(axis=1)
- Log-scale the energy:
mlenergy_df['Energy Log Scaled'] = np.log1p(mlenergy_df['Energy (J)'])
- Create categorical labels using tertiles of log-energy:
mlenergy_df['Energy Output Label'] = pd.qcut( mlenergy_df['Energy Log Scaled'], q=3, labels=['Low', 'Medium', 'High'] )
- Feature selection:
- Drop direct energy and text fields; keep numeric configuration/runtime columns:
X = mlenergy_df.drop(columns=[ 'Energy (J)', 'Energy Log Scaled', 'Energy Output Label', 'Energy/video (J)', 'Energy/image (J)', 'Energy/req (J)', 'Model', 'GPU' ]) y = mlenergy_df['Energy Output Label']
- Drop direct energy and text fields; keep numeric configuration/runtime columns:
- Model: Multiclass logistic regression (one-vs-rest, balanced class weights).
- Split:
X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.3, random_state=42, stratify=y )
- Training:
from sklearn.linear_model import LogisticRegression model = LogisticRegression( max_iter=1000, solver='liblinear', class_weight='balanced' ) model.fit(X_train, y_train)
- Evaluation:
from sklearn.metrics import accuracy_score, confusion_matrix pred = model.predict(X_test) acc = accuracy_score(y_test, pred) cm = confusion_matrix(y_test, pred, labels=['Low', 'Medium', 'High'])
-
Anthropic Economic Index:
- Check missingness, cardinality, and distributions.
- Create bar plots of interaction/task type frequencies.
-
GPU-specific views (ML.ENERGY):
- Subset to H100 and A100 workloads.
- Plot histograms of
Energy/req (J)per GPU. - Draw correlation heatmaps of numeric features within each GPU subset.
- Total runs: 431
- GPUs:
- H100: 221 runs
- A100: 210 runs
Energy (J) distribution (overall):
- Mean: ≈ 576 J
- Std: ≈ 1887 J
- Min: 5.6 J
- Median: 119.0 J
- Max: 16,916 J
Label distribution (Low / Medium / High):
- Low: 144 runs
- Medium: 143 runs
- High: 144 runs
By label:
| Label | Mean Energy (J) | Median (J) | Max (J) |
|---|---|---|---|
| Low | 28.5 | 23.2 | 54.9 |
| Medium | 123.5 | 119.0 | 245.8 |
| High | 1574.1 | 711.5 | 16,915.9 |
High-energy workloads are roughly an order of magnitude more energy-intensive than medium ones and ~50× more than low-energy runs on average.
- Task: Predict
Low/Medium/Highfrom configuration/runtime features only. - Model: Logistic Regression (balanced).
- Test accuracy: ~81.5% on a 30% stratified hold-out.
Confusion matrix (rows = true labels, cols = predicted):
| Pred: Low | Pred: Medium | Pred: High | |
|---|---|---|---|
| True Low | 39 | 4 | 0 |
| True Medium | 10 | 28 | 5 |
| True High | 0 | 5 | 39 |
Most errors occur in the Medium band, which overlaps with both low and high workloads.
Selected correlations with Energy (J):
- Strong positive:
Batch latency (s)(~0.81)Frames(~0.72)
- Moderate positive:
Denoising steps(~0.33)
- Negative:
Token tput (tok/s)(~−0.31)PP(pipeline parallelism) (~−0.26)Avg Output Tokens(~−0.18)
Interpretation:
- Slow, frame-heavy, many-step workloads are high-energy.
- Higher throughput and more parallelism correlate with lower energy per run in this snapshot.
-
Data preparation
- Run
python scripts/export_mlenergy_snapshot.py. - Optionally verify that
data/mlenergy/raw/contains structured energy files.
- Run
-
Modeling
- Open and run
notebooks/llm-energy-output-modeling.ipynb. - Inspect:
- The summary statistics of
Energy (J). - The
Energy Output Labeldistribution. - Correlation heatmaps.
- The summary statistics of
- Open and run
-
Evaluation
- View the reported accuracy and confusion matrix cells.
- Experiment with changing:
- Feature subsets.
- Train–test split ratio.
- Logistic regression hyperparameters.
-
What you can replicate
- Reproduce energy labels and distributions.
- Reproduce the ~81.5% accuracy logistic regression baseline.
- Compare A100 vs H100 workloads on energy distributions.
-
Fetch data
- Run:
python scripts/fetch_anthropic_econ_index.py
- Run:
-
Open notebooks
- Start Jupyter and open:
notebooks/data_quality.ipynbnotebooks/anthropic-preprocessing-json-file-combining-in-d*.ipynb
- Start Jupyter and open:
-
Run EDA cells
- Inspect:
- Dataset sizes, column names, and types.
- Value counts for key categorical features.
- Plot:
- Bar charts of interaction types.
- Any baseline model performance if defined in the notebook.
- Inspect:
-
Future extension
- Add your own models to predict labels in the Economic Index dataset.
- Connect task categories to energy estimates from ML.ENERGY.
This repo does not expose a formal Python package API yet, but it does provide a small CLI-style interface via scripts/.
Purpose:
- Copy structured ML.ENERGY data from the
external/mlenergysubmodule intodata/mlenergy/raw/and record the submodule commit hash for provenance.
Usage:
python scripts/export_mlenergy_snapshot.pyBehavior:
- Expects
external/mlenergy/datato exist (from the submodule). - Recursively scans for
.csv,.json,.jsonlfiles. - Copies them into
data/mlenergy/raw/preserving relative paths. - Writes
data/mlenergy/README.mdwith the source commit SHA.
Purpose:
- Download the Anthropic Economic Index dataset from Hugging Face and save it locally in CSV and Parquet formats.
Usage:
python scripts/fetch_anthropic_econ_index.pyBehavior:
- Uses
datasets.load_dataset("Anthropic/EconomicIndex"). - For each split (e.g.,
train,validation,test):- Writes
anthropic_econ_index.<split>.parquet. - Writes
anthropic_econ_index.<split>.csv.
- Writes
- Output directory:
data/anthropic_econ_index/processed/.
Purpose:
- Download selected corporate environmental/climate reports from public URLs (Google, Microsoft) and store them locally.
Usage:
python scripts/fetch_corporate_carbon.pyBehavior:
- Creates
data/corporate_carbon/raw/if needed. - Downloads:
- Google 2024 Environmental Report (PDF).
- Microsoft 2024 Environmental Sustainability Report (PDF).
- Microsoft 2024 Environmental Data Fact Sheet (PDF).
- Skips re-download if the files already exist.
-
Modeling
- Upgrade to tree-based or boosting models (Random Forest, XGBoost, LightGBM).
- Switch from classification to regression on
Energy (J).
-
Data fusion
- Align Economic Index tasks with ML.ENERGY workloads where possible.
- Parse corporate PDFs into tables and link energy to emissions (CO₂e).
-
Tooling
- Factor the modeling notebook into reusable Python modules.
- Add a
requirements.txtorpyproject.tomland simple CLI (python -m kpmg1d.train).
Customize this section with your own names and roles.
-
Kai – ML.ENERGY Modeling & Analysis
- Parsed and cleaned ML.ENERGY into
mlenergy_df. - Engineered
Energy (J)and the Low/Medium/High labels. - Built and evaluated the logistic regression baseline.
- Parsed and cleaned ML.ENERGY into
-
Amanda – Economic Index Data & Baselines
- Implemented the Economic Index preprocessing and data-quality notebooks.
- Performed frequency and distributional analysis over interaction/task types.
- Prototyped baseline models on Economic Index data (where applicable).
-
Masumi – Data Pipelines & Corporate Carbon Reports
- Wrote
fetch_anthropic_econ_index.py,export_mlenergy_snapshot.py, and/orfetch_corporate_carbon.py. - Organized the
data/directory structure and documented ML.ENERGY provenance.
- Wrote