@Radioheading @lmxyy Thanks for your awesome work!
I wonder could this function be directly used during training? https://github.com/mit-han-lab/radial-attention/blob/main/radial_attn/attn_mask.py#L150-L206
If not, would you have a schedule to release a trainable one (as indicated in the paper)?
Regards,
Toyota