CUTLASS Ex 77 FMHA Softmax Instruction Interleaving (FMA, EX2,Dtype conversion) #2593
                  
                    
                      manishucsd
                    
                  
                
                  started this conversation in
                General
              
            Replies: 1 comment
-
| 
         the idea is that we want only one softmax wg to execute at a time, but we also want to minimize the gap between the two barriers executing. barrier synchronization has latency, so the arrive will not immediately release the wait. as such, we try to pull it forward so it happens a little bit earlier, i.e. 10 is an attempt at tuning it so that the gap is minimal. note that in practice synchronizing code like this is tricky since ptxas often moves stuff around, and kinda empiric.  | 
  
Beta Was this translation helpful? Give feedback.
                  
                    0 replies
                  
                
            
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment
  
        
    
Uh oh!
There was an error while loading. Please reload this page.
-
Hi @v0i0 , Can you please explain what is going on in the below code:
It is from here in detail?
Specifically the magic number
6and const int kReleasePipeCount = 10; // must be multiple of 2. The sequence of code interleaves FFMA, 2xexp2 and 2xF32-to-2xB16, but I don't get the issuance of order_s.arrive() based on the magic number kReleasePipeCount.Beta Was this translation helpful? Give feedback.
All reactions