I'm a seimology researcher who wants to use PixArt to generate earthquake data. I notice that you use xformer to substitude torch.nn.functional.scaled_dot_product_attention. Why? From my experiment, SDPA in torch in much more faster than xformer.
Appreciate it for possible replies.