A minimal implementation of Hierarchical Reasoning Models (HRM)
Ensure PyTorch and CUDA are installed. The repo needs CUDA extensions to be built. If not present, run the following commands:
# Install CUDA 12.6
CUDA_URL=https://developer.download.nvidia.com/compute/cuda/12.6.3/local_installers/cuda_12.6.3_560.35.05_linux.run
wget -q --show-progress --progress=bar:force:noscroll -O cuda_installer.run $CUDA_URL
sudo sh cuda_installer.run --silent --toolkit --override
export CUDA_HOME=/usr/local/cuda-12.6
# Install PyTorch with CUDA 12.6
PYTORCH_INDEX_URL=https://download.pytorch.org/whl/cu126
pip3 install torch torchvision torchaudio --index-url $PYTORCH_INDEX_URL
# Additional packages for building extensions
pip3 install packaging ninja wheel setuptools setuptools-scmThen install FlashAttention. For Hopper GPUs, install FlashAttention 3
git clone [email protected]:Dao-AILab/flash-attention.git
cd flash-attention/hopper
python setup.py installFor Ampere or earlier GPUs, install FlashAttention 2
pip3 install flash-attnpip install -r requirements.txtThis project uses Weights & Biases for experiment tracking and metric visualization. Ensure you're logged in:
wandb loginGetting started with training an HRM is as simple as editing the config config and running the following command:
python main.pyComing soon!
- Eval script
- Streamlit application
- Trained checkpoints
- Benchmarks on a variety of datasets and architectures
- Comparisons with the OG repo - sapientinc/HRM
- Mech-Interp
- Ablations
- Blog
- Tips and tricks, analysis, important takeaways