Internal repo for iteration on Diffusion LLMs
If necessary, provision accelerator-enabled VMs with SkyPilot.
For Lambda, e.g., this is all it takes to create a single A100 node for development:
pip install skypilot[lambda]
sky launch --cluster dllm --gpus A100 --workdir .
ssh dllm # sky creates ssh configs for youSkyPilot can also provision clusters, setup environments, manage task execution and some other useful stuff. See docs/skypilot.md for more details.
Install mamba or conda (mamba is far faster):
# For mamba: https://mamba.readthedocs.io/en/latest/installation/micromamba-installation.html#umamba-install
"${SHELL}" <(curl -L micro.mamba.pm/install.sh)
# For conda: https://docs.conda.io/projects/conda/en/stable/user-guide/install/linux.html
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh && \
bash miniconda.sh -b -p /opt/condaSetup a conda environment and install dependencies using:
micromamba env create -y -f requirements.yaml --channel-priority flexibleActivate the environment:
conda activate dllm-dev
# OR micromamba activate dllm-devWe also include a setup_env.sh script that can be used to set up the
environment on a new machine.
Run the script using:
source setup_env.shYou can also include this snippet in shell / slurm scripts to set up the environment on a compute node.
In this script, we set up WandB and HuggingFace tokens by sourcing a script which is
expected to be in the /home/<YOUR_USER_NAME>/ directory.
Copy the contents below into a shell script /home/<YOUR_USER_NAME>/setup_discdiff.sh
and replace the placeholder tokens with your own:
# W&B / HF Setup
export WANDB__SERVICE_WAIT=600
export _WANDB_STARTUP_DEBUG="true"
export WANDB_ENTITY="kuleshov-group"
export WANDB_API_KEY="<WANDB_API_KEY>"
echo "Logging into W&B as '${WANDB_ENTITY}'."
# HF Setup
export HUGGINGFACE_TOKEN="<HF_TOKEN>"
huggingface-cli login --token ${HUGGINGFACE_TOKEN} --add-to-git-credentialWe will try to use GitHub issues to track bugs, features, and todos. To contribute to the repo, please create a new issue and assign it to yourself. Then create a new branch from the issue and open a pull request.
We use pre-commit to run linters and formatters on the code. To install the pre-commit hooks, run:
pre-commit installOn every git commit,
the pre-commit hooks will run automatically and report any issues / automatic fixes that
were applied.
bash_scripts: These shells scripts can be used to reproduce the experiments from our work.configs: We utilize hydra config files to organize experiments.config.yamlThis config is the entry point for launching training experiments.eval_config.yamlThis config is the entry point for evaluations.
scripts: The main training and evaluation scriptsscripts/composer_scripts/train_discrete_denoiser.py: This script is the main training entry point.scripts/evals: These scripts run the evaluation for the translation, summarization, and math reasoning datasets, as well as any likelihood evaluation.
src:src/denoiser: During training, denoisers take in "noisy" inputs and predict clean signals. At inference, starting from a purely noisy signal, through iterative denoising, these classes produce samples that resemble data.AR: We can view autoregressive models within this paradigm. Noise is applied by masking tokens one at a time from right-to-left. Denoising is done one token at a time, left-to-right.Diffusion: We implement masked diffusion models:MDLM: Standard masked diffusion.BD3LM: Block diffusion models.E2D2: Our encoder-decoder implementation.
src/backbone: These are the underlying neural networks the take in noisy inputs and produce logits. Each denoiser is parameterized by a backbone. The denoiser can optionally, post-process the logit outputs of the backbone to produce log-probs over the clean sequence.