Hi, Would it be possible to apply masking only in the decoder single head attention? I think we have masking in both MHA and SHA in the decoder. Best, Shaghayegh