[DOC] Need clarity on SGV Layout in new Xe re-architecture

Hi, I have several questions relating to the Copy Atoms. Opened this issue to seek clarification about them

1/
<img width="1433" height="348" alt="Image" src="https://github.com/user-attachments/assets/425e702e-6a83-4b12-a57c-a8a883c6188f" />

From this representation, I could understand that this 8x4 block load needs 2 subgroups of 16 threads to operate. However, it's confusing what the indices in the "Subgroup View" represent. If they stand for the linear indices for that block in memory, I don't get why it is called a Subgroup View.

2/
Are there any utilities to visualize SG-V partitioning similar to the [tools to visualize TV partitioning](https://docs.nvidia.com/cutlass/media/docs/cpp/cute/03_tensor.html#thread-value-partitioning) in CuTe?

3/
Under the "Subgroup Scope and Thread-Local Data" section of the Xe-rearch documentation:
> DPAS and block 2D copy atoms are subgroup operations, meaning that all 16 threads of the subgroup collectively execute these operations, and collectively own all input/output data.

If we take a block 2d copy operation like `XE_LOAD_2D_TRANSPOSE` of bits 32, height 32, and width 8, it is not immediately clear to me if it gets executed by a single work-item or a single sub-group or multiple sub-groups. If we go by the logic in the example above, then we have 32x8 = 256 elements to copy and we should need 256/16 = **16 sub-groups**, where each work-item handles loading of 16 elements, is that so?

cc @petercad @mkumargarg



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DOC] Need clarity on SGV Layout in new Xe re-architecture #607

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[DOC] Need clarity on SGV Layout in new Xe re-architecture #607

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions