(topic/tracker) faceswap pipeline performance

This refers to the work in the alibaba_fp16 branch of this repository.

from the fp16-model directory, with an IREE environment,

to run the controlled ip-adapted unet module:
```
iree-compile  --iree-hal-target-backends=rocm  --iree-hip-target=gfx942  --iree-preprocessing-pass-pipeline='builtin.module(iree-preprocessing-transpose-convolution-pipeline, iree-preprocessing-pad-to-intrinsics)'  --iree-hal-force-indirect-command-buffers=1  --iree-stream-resource-memory-model=discrete  --iree-hip-legacy-sync=0 --iree-hal-memoization=1  --iree-opt-strip-assertions  --iree-opt-outer-dim-concat=1  --iree-hip-waves-per-eu=2  --iree-llvmgpu-enable-prefetch=1  --iree-codegen-gpu-native-math-precision=1  --iree-dispatch-creation-enable-aggressive-fusion=0 --iree-codegen-transform-dialect-library=specs/attention_and_matmul_spec_control.mlir  base_ir/stable_diffusion_xl_base_1_0_controlled_unet_bs1_64_1024x960_fp16.mlir -o stable_diffusion_xl_base_1_0_controlled_unet_bs1_64_1024x960_fp16_amdgpu_gfx942.vmfb
```
```
iree-benchmark-module --module=stable_diffusion_xl_base_1_0_controlled_unet_bs1_64_1024x960_fp16_amdgpu_gfx942.vmfb --device=hip://0 --input=@sample_inputs/controlled_unet_npys/cunet_in_0.npy --input=@sample_inputs/controlled_unet_npys/cunet_in_1.npy --input=@sample_inputs/controlled_unet_npys/cunet_in_2.npy --input=@sample_inputs/controlled_unet_npys/cunet_in_3.npy --input=@sample_inputs/controlled_unet_npys/cunet_in_4.npy --input=@sample_inputs/controlled_unet_npys/cunet_in_5.npy --input=@sample_inputs/controlled_unet_npys/cunet_in_6.npy --input=@sample_inputs/controlled_unet_npys/cunet_in_7.npy --input=@sample_inputs/controlled_unet_npys/cunet_in_8.npy --device_allocator=caching --parameters=model=splat/controlled_unet.irpa --function=run_forward --benchmark_repetitions=3
```
note the compiler flags. We will want to turn this flag on once a distributed context  bug is fixed: `--iree-dispatch-creation-enable-aggressive-fusion=1` ; tracked in https://github.com/iree-org/iree/issues/19688

The attention spec is different only because one attention shape needs to be commented out of the tunings. 

@MaheshRavishankar noted that this command was also missing a flag for matmul generalization.


Real weights for the controlled unet module are publicly available here: https://sharkpublic.blob.core.windows.net/sharkpublic/sdxl/weights/stable_diffusion_xl_base_1_0_controlled_unet_dataset_fp16.irpa


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(topic/tracker) faceswap pipeline performance #112

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

(topic/tracker) faceswap pipeline performance #112

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions