Why xformer?

I'm a seimology researcher who wants to use PixArt to generate earthquake data. I notice that you use xformer to substitude `torch.nn.functional.scaled_dot_product_attention`. Why? From my experiment, SDPA in torch in much more faster than xformer.

Appreciate it for possible replies.