You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In "sm100_mma_warpspecialized.hpp", why does load() pre-issue producer_try_acquire for the next stage before launching TMA copies for the current stage
#2761
First of all, thank for your fantasic work and documentation. While reading this particular load implementation, I have couple questions:
Ordering: What’s the rationale for calling producer_try_acquire(state /N+1/) before issuing stage N TMA copies? Is this required for correctness, or a latency-hiding choice (pre-fetching the next barrier token to make the next producer_acquire likely immediate)? Are there any microarchitectural considerations?
PTX “may suspend” : PTX says a try_wait can be “potentially suspended.” How should we interpret this in practice—does it yield the warp/scoreboard briefly vs. pure non-blocking polling?
Could you please help me to clarify my questions? Appreciate any explain.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
In line 581 - 628,the below source code
First of all, thank for your fantasic work and documentation. While reading this particular load implementation, I have couple questions:
Could you please help me to clarify my questions? Appreciate any explain.
Beta Was this translation helpful? Give feedback.
All reactions