-
Notifications
You must be signed in to change notification settings - Fork 67
Description
Hi, I have several questions relating to the Copy Atoms. Opened this issue to seek clarification about them
From this representation, I could understand that this 8x4 block load needs 2 subgroups of 16 threads to operate. However, it's confusing what the indices in the "Subgroup View" represent. If they stand for the linear indices for that block in memory, I don't get why it is called a Subgroup View.
2/
Are there any utilities to visualize SG-V partitioning similar to the tools to visualize TV partitioning in CuTe?
3/
Under the "Subgroup Scope and Thread-Local Data" section of the Xe-rearch documentation:
DPAS and block 2D copy atoms are subgroup operations, meaning that all 16 threads of the subgroup collectively execute these operations, and collectively own all input/output data.
If we take a block 2d copy operation like XE_LOAD_2D_TRANSPOSE of bits 32, height 32, and width 8, it is not immediately clear to me if it gets executed by a single work-item or a single sub-group or multiple sub-groups. If we go by the logic in the example above, then we have 32x8 = 256 elements to copy and we should need 256/16 = 16 sub-groups, where each work-item handles loading of 16 elements, is that so?
