This directory provides scripts to perform inference on a V-diffusion model and is tested and maintained by Habana.
For more information on training and inference of deep learning models using Gaudi, refer to developer.habana.ai.
- Model-References
- Model Overview
- Setup
- Model Checkpoint
- Inference and Examples
- Performance
- Supported Configuration
- Changelog
Denoising Diffusion Probabilistic Models are trained to reverse a gradual noising process, in order to generate samples from the learned data distributions starting from random noise. They use the 'v' objective from Progressive Distillation for Fast Sampling of Diffusion Models. The scripts provided in this repository were tested with PRK/PLMS sampling methods. The model generates images based on a textual prompt.
Please follow the instructions provided in the Gaudi Installation Guide
to set up the environment including the $PYTHON environment variable. To achieve the best performance, please follow the methods outlined in the Optimizing Training Platform guide.
The guides will walk you through the process of setting up your system to run the model on Gaudi.
In the docker container, clone this repository and switch to the branch that matches your SynapseAI version.
You can run the hl-smi utility to determine the SynapseAI version.
git clone -b [SynapseAI version] https://github.com/HabanaAI/Model-References
cd Model-References/PyTorch/generative_models/v-diffusion- In the docker container, go to the model directory:
cd Model-References/PyTorch/generative_models/v-diffusion- Install the required packages using pip
$PYTHON -m pip install -r requirements.txtOur example uses a CC12M_1 CFG 256x256 model.
You can use the following set of commands to create a checkpoints/ directory and download the model there.
cd Model-References/PyTorch/generative_models/v-diffusion
mkdir checkpoints
wget https://the-eye.eu/public/AI/models/v-diffusion/cc12m_1_cfg.pth && mv cc12m_1_cfg.pth checkpoints/
This is a 602M parameter CLIP conditioned model trained on Conceptual 12M for 3.1M steps and then fine-tuned for classifier-free guidance for 250K additional steps.
SHA-256 for the CC12M_1 CFG 256x256 file: 4fc95ee1b3205a3f7422a07746383776e1dbc367eaf06a5b658ad351e77b7bda,
Consider the following command:
./cfg_sample.py "the rise of consciousness":5 -n 8 -bs 4 --seed 0 --device 'hpu' --hmp
It will generate 2 batches of 4 images (controlled by a -bs parameter) each for a total of 8 images (controlled by a -n parameter).
:5 in the example above specifies a weight associated with the textual prompt.
A weight of 1 will sample images that match the prompt roughly as well as images that usually match prompts like that in the training set.
The default weight is 3.
For a more detailed description of parametrs, please see the help message:
./cfg_sample.py -h
The first batch of images will generate a performance penalty. All subsequent batches will be generated much faster. For example, the following command will generate 4 batches of 4 images. It will take significantly more time to generate the first set of 4 images than the remaining 3.
./cfg_sample.py "the rise of consciousness":5 -n 16 -bs 4 --seed 0 --device 'hpu' --hmp
| Device | SynapseAI Version | PyTorch Version |
|---|---|---|
| Gaudi | 1.7.1 | 1.13.0 |
Removed PT_HPU_ENABLE_SPLIT_INFERENCE environment variable.
Initial release
Major changes done to original model from crowsonkb/v-diffusion-pytorch repository:
- Changed README.
- Removed jupyter notebooks.
- Added HPU support.
- Added BF16 mixed precision logic.
- Replaced GroupNorm (with num_groups=1) with mathematically equivalent LayerNorm for better performance.
- Changed some code that was originally using variables in order to avoid graph recompilations.
- Added htcore.mark_step() in relevant places.
- Changed weights conversion in CLIP model so that weights are converted to bf16 on HPU and to fp16 otherwise.
- Moved randn operator execution to CPU.
- Changed the way script performance figures are reported.
- Replaced torch.atan2 with torch.atan in diffusion/utils.py.
- Changed repeat_interleave logic in cfg_model_fn.
- Set PT_HPU_ENABLE_SPLIT_INFERENCE environment variable.
- Removed torch.cuda.amp.autocast when running on HPU.