Nunchaku SANA Models
This interactive Gradio application can generate an image based on your provided text prompt. The base model is SANA-1.6B.
python run_gradio.py- By default, the Gemma-2B model is loaded as a safety checker. To disable this feature and save GPU memory, use
--no-safety-checker. - By default, only the INT4 DiT is loaded. Use
-p int4 bf16to add a BF16 DiT for side-by-side comparison, or-p bf16to load only the BF16 model.
We provide a script, generate.py, that generates an image from a text prompt directly from the command line, similar to the demo. Simply run:
python generate.py --prompt "You Text Prompt"- The generated image will be saved as
output.pngby default. You can specify a different path using the-oor--output-pathoptions. - By default, the script uses our INT4 model. To use the BF16 model instead, specify
-p bf16. - You can adjust the number of inference steps and classifier-free guidance scale with
-tand-g, respectively. The defaults are 20 steps and a guidance scale of 5. - In addition to the classifier-free guidance, you can also adjust the PAG guidance scale with
--pag-scale. The default is 2.
To measure the latency of our INT4 models, use the following command:
python latency.py- Adjust the number of inference steps and the guidance scale using
-tand-g, respectively. The defaults are 20 steps and a guidance scale of 5. - You can also adjust the PAG guidance scale with
--pag-scale. The default is 2. - By default, the script measures the end-to-end latency for generating a single image. To measure the latency of a single DiT forward step instead, use the
--mode stepflag. - Specify the number of warmup and test runs using
--warmup-timesand--test-times. The defaults are 2 warmup runs and 10 test runs.
