trainable radial attention?

@Radioheading @lmxyy Thanks for your awesome work!

I wonder could this function be directly used during training? https://github.com/mit-han-lab/radial-attention/blob/main/radial_attn/attn_mask.py#L150-L206 

If not, would you have a schedule to release a trainable one (as indicated in the paper)?

Regards,
Toyota