This project was carried out in the Laboratory of Complex Systems at École Centrale Casablanca. It provides an interactive web interface using Gradio to experiment with training Andrej Karpathy's nanoGPT model. The interface allows users to configure the model architecture, set training hyperparameters, and apply various optimization techniques such as gradient accumulation, activation recomputation, and data parallelism. Users can then monitor the training process and results—such as loss curves, memory usage, and validation performance—in real-time.
- Interactive Configuration: Easily set nanoGPT model parameters (layers, heads, embedding dimension, etc.) and training hyperparameters (learning rate, batch sizes, iterations) through the Gradio UI.
- Optimization Technique Toggling:
- Enable/disable Gradient Accumulation by setting micro-batch size and gradient accumulation steps.
- Toggle Activation Recomputation (checkpointing) to save memory.
- Toggle Data Parallelism (
nn.DataParallel) for using multiple available GPUs on a single node. - UI placeholders for Pipeline Parallelism and Tensor Parallelism (these are flags passed to the training core but require manual implementation of the actual parallelism logic).
- Live Monitoring: View training progress, loss curves (live update for the current run), and estimated GPU memory usage (if CUDA is used and tracked).
- Comparative Analysis: After multiple runs with different configurations, view comparative plots for loss, training time, and (eventually) GPU memory. An automated analysis provides a summary of each run.
- Modular Core: The training logic is separated into
train_core.py, which handles the nanoGPT model and training loop, making it adaptable.
Okay, here's a draft for a README.md file for your project. This README assumes the current state of your project, including the Gradio interface for experimenting with nanoGPT and its various optimization techniques.
Markdown
This project provides an interactive web interface using Gradio to experiment with training Andrej Karpathy's nanoGPT model. It allows users to configure model architecture, training hyperparameters, and select various optimization techniques like gradient accumulation, activation recomputation, and data parallelism, then observe the training process and results in real-time.
- Interactive Configuration: Easily set nanoGPT model parameters (layers, heads, embedding dimension, etc.) and training hyperparameters (learning rate, batch sizes, iterations) through the Gradio UI.
- Optimization Technique Toggling:
- Enable/disable Gradient Accumulation by setting micro-batch size and gradient accumulation steps.
- Toggle Activation Recomputation (checkpointing) to save memory.
- Toggle Data Parallelism (
nn.DataParallel) for using multiple available GPUs on a single node. - UI placeholders for Pipeline Parallelism and Tensor Parallelism (these are flags passed to the training core but require manual implementation of the actual parallelism logic).
- Live Monitoring: View training progress, loss curves (live update for the current run), and estimated GPU memory usage (if CUDA is used and tracked).
- Comparative Analysis: After multiple runs with different configurations, view comparative plots for loss, training time, and (eventually) GPU memory. An automated analysis provides a summary of each run.
- Modular Core: The training logic is separated into
train_core.py, which handles the nanoGPT model and training loop, making it adaptable.
├── nanoGPT/ # Cloned official nanoGPT repository
│ ├── model.py # (Potentially modified for activation recomputation flag)
│ ├── data/
│ │ ├── shakespeare_char/
│ │ │ ├── prepare.py # Example data preparation script
│ │ │ └── ...
│ │ └── ...
│ └── ...
├── your_app_directory/ # Or THIS_STUDIO, where your app files are
│ ├── gradio_app.py # Main Gradio application script
│ ├── train_core.py # Core training logic adapted from nanoGPT
│ └── out-gradio-run/ # Example output directory for checkpoints (created by the app)
└── README.md # This file
- Python 3.8+
- PyTorch: Install from pytorch.org. Ensure you have a version compatible with your CUDA version if using GPUs.
- Cloned nanoGPT Repository: You must have Andrej Karpathy's
nanoGPTrepository cloned. This project assumes it's located such thattrain_core.pycan import from it (e.g., as a subdirectory within your main application folder, or ensure thenanoGPTfolder is in yourPYTHONPATH).git clone [https://github.com/karpathy/nanoGPT.git](https://github.com/karpathy/nanoGPT.git)
- Python Packages:
pip install torch numpy gradio matplotlib pandas # nanoGPT itself might have other dependencies like tiktoken if you use its data prep scripts extensively # pip install tiktoken datasets tqdm # if running nanoGPT's openwebtext prepare.py for example
- Clone nanoGPT: If you haven't already, clone the official nanoGPT repository into your project structure (e.g., inside your main application directory
THIS_STUDIO/nanoGPT/). - Modify
nanoGPT/model.py(for Activation Recomputation):- Open
nanoGPT/model.py. - Ensure
import torch.utils.checkpointis present at the top. - Modify the
forwardmethod of theGPTclass to accept ause_recompute=Falseargument and applytorch.utils.checkpoint.checkpoint(block, x, use_reentrant=False)to each transformer block ifuse_recomputeis true andmodel.trainingis true. (Refer to the provided code snippets in previous discussions for the exact modification).
- Open
- Place Application Files:
- Save the provided
gradio_app.pyandtrain_core.pyinto your main application directory (e.g.,THIS_STUDIO/).
- Save the provided
- Prepare Data:
- For each dataset you intend to use (e.g.,
shakespeare_char), you must run itsprepare.pyscript first. - Example for
shakespeare_char:cd nanoGPT/data/shakespeare_char/ python prepare.py cd ../../.. # Navigate back to your app's root or where gradio_app.py is
- The application includes a basic automatic preparation attempt for known small datasets if data files are missing, but manual preparation is strongly recommended, especially for larger datasets like
openwebtext. The automatic preparation usessys.executableto callpython prepare.py.
- For each dataset you intend to use (e.g.,
- Navigate to the directory containing
gradio_app.pyandtrain_core.py(e.g.,THIS_STUDIO/). - Run the Gradio application:
python gradio_app.py
- Open the local URL provided in your terminal (usually
http://127.0.0.1:7860orhttp://localhost:7860) in your web browser. - Use the interface to configure your model and training run, then click "Launch nanoGPT Training."
gradio_app.py:- Defines the Gradio user interface with sliders, checkboxes, and dropdowns for all configurations.
- Manages the collection of metrics from training runs for plotting and analysis.
- Calls
train_core.pyto execute training runs based on UI settings. - Updates plots and logs in real-time (or near real-time) as data is yielded from the training core.
train_core.py:- Adapts the training loop from nanoGPT's
train.py. - Initializes the
GPTmodel (from the clonednanoGPT/model.py) with the configuration specified through Gradio. - Handles data loading, optimizer setup, the training loop, gradient accumulation, and loss calculation.
- Implements toggles for activation recomputation and
nn.DataParallel. - Includes (non-functional) placeholders for Tensor Parallelism and Pipeline Parallelism flags.
- Yields progress (loss, iteration, timing, GPU memory) back to
gradio_app.pyfor display. - Includes a basic automatic data preparation step using
subprocessto call the dataset'sprepare.pyif binary data files are not found (recommended for small datasets only).
- Adapts the training loop from nanoGPT's
- Tensor Parallelism & Pipeline Parallelism: The UI includes flags for these, and
train_core.pyaccepts these flags. However, the actual implementation of these advanced parallelism techniques is not included and is a significant undertaking. Users wishing to implement these would need to:- Choose a library (e.g., DeepSpeed, FairScale, PyTorch native TP/PP utilities).
- Modify
nanoGPT/model.pyextensively to shard layers (for TP) or define pipeline stages (for PP) using the chosen library's APIs. - Heavily modify
train_core.pyto initialize the distributed environment, wrap the model with the library's parallel equivalents, and adapt the training loop to the library's specific methods for forward/backward passes and optimizer steps. - This would likely involve moving away from
nn.DataParallelif these more advanced techniques are used.
ModuleNotFoundError: No module named 'nanoGPT.model'(or similar fortrain_core):- Ensure your
nanoGPTdirectory is correctly placed relative totrain_core.pyandgradio_app.pyas per the "Project Structure" and that thesys.pathmodifications at the top of the Python scripts are working for your layout. The scripts assumegradio_app.pyandtrain_core.pyare in a directory, and thenanoGPTcloned repository is a direct subdirectory of that directory.
- Ensure your
Error: Data files not found in .../nanoGPT/data/<dataset_name>:- You must run the
python prepare.pyscript within the specificnanoGPT/data/<dataset_name>/directory before trying to use that dataset in the Gradio app.
- You must run the
Error: 'python' command not found. Cannot run preparation script automatically:- This occurs if the automatic data preparation step in
train_core.pycannot find your Python executable. The script now usessys.executablewhich should point to the currently running Python, making this less likely. If it persists, ensuresys.executableis valid or prepare data manually.
- This occurs if the automatic data preparation step in
- Gradio Errors (
AttributeError: Cannot call click outside of a gradio.Blocks context.):- Ensure all Gradio UI component definitions and their event handlers (like
.click()) are strictly within thewith gr.Blocks() as demo:context ingradio_app.py.
- Ensure all Gradio UI component definitions and their event handlers (like