Skip to content

(topic/tracker) faceswap pipeline performance #112

@monorimet

Description

@monorimet

This refers to the work in the alibaba_fp16 branch of this repository.

from the fp16-model directory, with an IREE environment,

to run the controlled ip-adapted unet module:

iree-compile  --iree-hal-target-backends=rocm  --iree-hip-target=gfx942  --iree-preprocessing-pass-pipeline='builtin.module(iree-preprocessing-transpose-convolution-pipeline, iree-preprocessing-pad-to-intrinsics)'  --iree-hal-force-indirect-command-buffers=1  --iree-stream-resource-memory-model=discrete  --iree-hip-legacy-sync=0 --iree-hal-memoization=1  --iree-opt-strip-assertions  --iree-opt-outer-dim-concat=1  --iree-hip-waves-per-eu=2  --iree-llvmgpu-enable-prefetch=1  --iree-codegen-gpu-native-math-precision=1  --iree-dispatch-creation-enable-aggressive-fusion=0 --iree-codegen-transform-dialect-library=specs/attention_and_matmul_spec_control.mlir  base_ir/stable_diffusion_xl_base_1_0_controlled_unet_bs1_64_1024x960_fp16.mlir -o stable_diffusion_xl_base_1_0_controlled_unet_bs1_64_1024x960_fp16_amdgpu_gfx942.vmfb
iree-benchmark-module --module=stable_diffusion_xl_base_1_0_controlled_unet_bs1_64_1024x960_fp16_amdgpu_gfx942.vmfb --device=hip://0 --input=@sample_inputs/controlled_unet_npys/cunet_in_0.npy --input=@sample_inputs/controlled_unet_npys/cunet_in_1.npy --input=@sample_inputs/controlled_unet_npys/cunet_in_2.npy --input=@sample_inputs/controlled_unet_npys/cunet_in_3.npy --input=@sample_inputs/controlled_unet_npys/cunet_in_4.npy --input=@sample_inputs/controlled_unet_npys/cunet_in_5.npy --input=@sample_inputs/controlled_unet_npys/cunet_in_6.npy --input=@sample_inputs/controlled_unet_npys/cunet_in_7.npy --input=@sample_inputs/controlled_unet_npys/cunet_in_8.npy --device_allocator=caching --parameters=model=splat/controlled_unet.irpa --function=run_forward --benchmark_repetitions=3

note the compiler flags. We will want to turn this flag on once a distributed context bug is fixed: --iree-dispatch-creation-enable-aggressive-fusion=1 ; tracked in iree-org/iree#19688

The attention spec is different only because one attention shape needs to be commented out of the tunings.

@MaheshRavishankar noted that this command was also missing a flag for matmul generalization.

Real weights for the controlled unet module are publicly available here: https://sharkpublic.blob.core.windows.net/sharkpublic/sdxl/weights/stable_diffusion_xl_base_1_0_controlled_unet_dataset_fp16.irpa

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions