This refers to the work in the alibaba_fp16 branch of this repository.
from the fp16-model directory, with an IREE environment,
to run the controlled ip-adapted unet module:
iree-compile --iree-hal-target-backends=rocm --iree-hip-target=gfx942 --iree-preprocessing-pass-pipeline='builtin.module(iree-preprocessing-transpose-convolution-pipeline, iree-preprocessing-pad-to-intrinsics)' --iree-hal-force-indirect-command-buffers=1 --iree-stream-resource-memory-model=discrete --iree-hip-legacy-sync=0 --iree-hal-memoization=1 --iree-opt-strip-assertions --iree-opt-outer-dim-concat=1 --iree-hip-waves-per-eu=2 --iree-llvmgpu-enable-prefetch=1 --iree-codegen-gpu-native-math-precision=1 --iree-dispatch-creation-enable-aggressive-fusion=0 --iree-codegen-transform-dialect-library=specs/attention_and_matmul_spec_control.mlir base_ir/stable_diffusion_xl_base_1_0_controlled_unet_bs1_64_1024x960_fp16.mlir -o stable_diffusion_xl_base_1_0_controlled_unet_bs1_64_1024x960_fp16_amdgpu_gfx942.vmfb
iree-benchmark-module --module=stable_diffusion_xl_base_1_0_controlled_unet_bs1_64_1024x960_fp16_amdgpu_gfx942.vmfb --device=hip://0 --input=@sample_inputs/controlled_unet_npys/cunet_in_0.npy --input=@sample_inputs/controlled_unet_npys/cunet_in_1.npy --input=@sample_inputs/controlled_unet_npys/cunet_in_2.npy --input=@sample_inputs/controlled_unet_npys/cunet_in_3.npy --input=@sample_inputs/controlled_unet_npys/cunet_in_4.npy --input=@sample_inputs/controlled_unet_npys/cunet_in_5.npy --input=@sample_inputs/controlled_unet_npys/cunet_in_6.npy --input=@sample_inputs/controlled_unet_npys/cunet_in_7.npy --input=@sample_inputs/controlled_unet_npys/cunet_in_8.npy --device_allocator=caching --parameters=model=splat/controlled_unet.irpa --function=run_forward --benchmark_repetitions=3
note the compiler flags. We will want to turn this flag on once a distributed context bug is fixed: --iree-dispatch-creation-enable-aggressive-fusion=1 ; tracked in iree-org/iree#19688
The attention spec is different only because one attention shape needs to be commented out of the tunings.
@MaheshRavishankar noted that this command was also missing a flag for matmul generalization.
Real weights for the controlled unet module are publicly available here: https://sharkpublic.blob.core.windows.net/sharkpublic/sdxl/weights/stable_diffusion_xl_base_1_0_controlled_unet_dataset_fp16.irpa
This refers to the work in the alibaba_fp16 branch of this repository.
from the fp16-model directory, with an IREE environment,
to run the controlled ip-adapted unet module:
note the compiler flags. We will want to turn this flag on once a distributed context bug is fixed:
--iree-dispatch-creation-enable-aggressive-fusion=1; tracked in iree-org/iree#19688The attention spec is different only because one attention shape needs to be commented out of the tunings.
@MaheshRavishankar noted that this command was also missing a flag for matmul generalization.
Real weights for the controlled unet module are publicly available here: https://sharkpublic.blob.core.windows.net/sharkpublic/sdxl/weights/stable_diffusion_xl_base_1_0_controlled_unet_dataset_fp16.irpa