Dim Order in ExecuTorch #8102

Gasoonjia · 2025-01-29T18:43:41Z

Gasoonjia
Jan 29, 2025
Collaborator

We are excited to share an update on the dim order feature in the ExecuTorch stack, which represents a significant step forward in how we handle tensor memory format at the IR level. This post consolidates all previous updates and highlights the progress made, ensuring that all progress is visible and celebrated.

What is a Dim Order?

A dim order is a tensor-level memory format representation that describes the layout of a dense tensor in memory. It serves as the source of truth for understanding the memory layout of input tensors across various components in ExecuTorch, particularly delegates and operators. It aims to replace torch.memory_format in the ExecuTorch stack for underlying memory representation. Its correctness is crucial for ensuring the accurate execution of tensor operations. For more details, refer to the PyTorch core documentation and ExecuTorch documentation.

Supporting Dim Order

Adding torch.tensor.dim_order() in PyTorch

We've added a new API to PyTorch, dim_order, which generates the dim order of a tensor in memory and provides functionality to detect ambiguity. With this API, one can confidently determine the dim order of your tensors and optimize your model lowering and performance. Read our original post to learn more about how to use dim_order and its benefits for your deep learning projects.

ET Export Flow: torch.memory_format in, Dim Order out

Support for dim order in the Edge dialect export flow has matured significantly. We have integrated dim order into the tensor IR and enabled its export from eager models to ExecuTorch models. Furthermore, we developed passes to replace operators requiring memory format inputs (e.g., to_copy) with our own performant functions taking dim order as input. Additionally, we implemented a verification mechanism to ensure the graph legally supports dim order. These updates enable support for multiple dim orders within a model graph, which is now the default behavior of the ExecuTorch export flow.

Dim Order Portable Operators and Runtime Support

At runtime, the dim order serves as the foundation for determining the memory format of input tensors. The memory format information for each runtime tensor, including strides, is derived from or relies on the dimension order. To ensure compatibility, all portable operators incorporate sanity checks to verify that the input tensor's dimension order aligns with their expectations. Furthermore, select operators now provide specialized support for contiguous and channels_last dimension orders. Additionally, every portable operator that accepts memory format as input has a corresponding variant based on dimension order. Serializing dim order for every ET managed tensor is supported. Tensor utility functions rely on dim order for calculating strides.

Delegate Support for Dim Order

Delegate support is critical for dim order functionality. We are thrilled to announce that several major delegates now dim order compatible, both AoT and at runtime. This includes XNNPACK, CoreML, Arm, QNN, Vulkan, MPS and MTK. This widespread support ensures compatibility and extends the functionality of ExecuTorch with dim order representation and operations. Delegates now have enough building blocks available in ET AoT and runtime to implement dim order related graph optimizations.

Example

Enabling/Disabling Dim Order

In ExecuTorch, using dim order is now the default behavior, so no specific configuration is required to enable it. For more information on exporting your model to ExecuTorch, please refer to our example.

If you need to temporarily disable dim order in your graph, you can set _skip_dim_order to True in the EdgeCompileConfig when exporting your model:

compile_config = exir.EdgeCompileConfig(_skip_dim_order=True)
edge_manager = to_edge(exported_model, compile_config=compile_config)

Manipulating Dim Order AoT in the Graph

You can add custom export passes to modify the dim order of specific parts of your model easily. For an example of how to do this, see here.

Delegate Support

If you're a delegate owner looking to make your delegate implementation dim order compatible, or trying to avoid permute nodes from your delegate graph, you may find this post helpful.

Next Steps

The overarching goal is to enhance the ExecuTorch (ET) ahead-of-time (AoT) and runtime experience with dim orders. This involves further refining and enforcing Edge IR dim order guarantees, including those for delegates, and ensuring a seamless experience comparable to PyTorch.
To achieve this, first, we will ensure that all portable operators support tensor dim orders, at a minimum, those mapped directly to PyTorch-defined memory formats. Relevant tests will be implemented to validate this functionality.
Additionally, we will provide support to delegate authors in leveraging dim order, particularly to optimizing the graph locally or globally to minimize tensor permutations and copies.

Conclusion

The successful implementation of dim order in ExecuTorch represents a significant milestone in our journey to provide a robust and flexible framework for tensor memory layout representation. This achievement would not have been possible without the collective efforts of the PyTorch and ExecuTorch communities, and we are grateful for their dedication and support. If you have any questions, feedback, or suggestions for improvement, please leave your comment here, or open a discussion on ExecuTorch GitHub.

Great thanks @digantdesai and @larryliu0820 for continued support and discussion!

barakugav · 2025-03-07T14:49:12Z

barakugav
Mar 7, 2025

Hey! Thanks for the extensive explanation. Im using the Cpp runtime API, and I have a few questions:

Can I compute the dim order from the strides? Or does it provide any additional information?
When given a pointer to the data, and sizes+strides+dim_order, should I change how I index into a tensor based on the dim_order or should I solely relay on the strides?
In the current header of tensor_impl.h there is a line "TODO(T148356881): Get rid of strides from ETensor". Is it with the intention that we assume the tensor is always dense (no gaps, default strides but maybe with permuted axes) and therefore the dim order is enough to reconstruct the strides? Is it valid to create a tensor with non default strides or some kernels may assume the tensor is dense? Also, is this TODO relevant and planned in the up-coming releases?

There isn't really much of a documentation for these aspects in the official docs, worth adding some of your text to it.
Thanks in advance! Love the library

3 replies

Gasoonjia Mar 7, 2025
Collaborator Author

Hi Barakugav, thanks for your questions!

Can I compute the dim order from the strides? Or does it provide any additional information?

The tensor in ExecuTorch is always dense, which means we can always compute correct stride from dim order. However dim order does provide additional memory information, if you calculate from stride to dim order, you may got legal but incorrect dim order. Take a tensor with size [1,1,1,1] as example, its stride is alwyas [1,1,1,1] but we can not tell what the correct dim order it is, though every permutation of [0,1,2,3] will be its legal dim order.

When given a pointer to the data, and sizes+strides+dim_order, should I change how I index into a tensor based on the dim_order or should I solely relay on the strides?

Both are fine, you should get same resutls

get rid of stride in tensor_impl.h

It is something in our list, we planned to remove stride from tensor ir, and user can recompute it from dim order with our utility function. However we do not have any specific plan to ship it in up-coming releases.

Hopes the answer can help you.

barakugav Mar 8, 2025

Thanks a lot, it clarify some things.
I didnt know tensors had to be dense as a strong requirement, that good to know! (and maybe add to the docs?)
If the size is [1,1,1,1], what does the dim order give you? meaning to the axises? So its more than a way to compute the strides, some kernels actually treat dim 1 differently than dim 2 logically?

Gasoonjia Mar 10, 2025
Collaborator Author

Yes you are right dim order should tell us how the different axises of indices maps to underlying memory element, from inner-most to outer-most. That's something size-agnostic and lost in stride, so we introduce dim order here.

One thing I want to highlight is: to reach the above goal we should generate dim order of each tensor from the operator generating the tensor, not the tensor's attributes (because the info might be lost after tensor creation). However currently we calculate dim order from stride in PyTorch, therefore the dim order now might be legal but incorrect and leads to some issue. pytorch/pytorch#146086 have a detailed illustration about the issue and how to mitigate it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dim Order in ExecuTorch #8102

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Dim Order in ExecuTorch #8102

Uh oh!

Uh oh!

Gasoonjia Jan 29, 2025 Collaborator

What is a Dim Order?

Supporting Dim Order

Adding torch.tensor.dim_order() in PyTorch

ET Export Flow: torch.memory_format in, Dim Order out

Dim Order Portable Operators and Runtime Support

Delegate Support for Dim Order

Example

Enabling/Disabling Dim Order

Manipulating Dim Order AoT in the Graph

Delegate Support

Next Steps

Conclusion

Replies: 1 comment · 3 replies

Uh oh!

Uh oh!

barakugav Mar 7, 2025

Uh oh!

Uh oh!

Gasoonjia Mar 7, 2025 Collaborator Author

Uh oh!

barakugav Mar 8, 2025

Uh oh!

Gasoonjia Mar 10, 2025 Collaborator Author

Gasoonjia
Jan 29, 2025
Collaborator

Replies: 1 comment 3 replies

barakugav
Mar 7, 2025

Gasoonjia Mar 7, 2025
Collaborator Author

Gasoonjia Mar 10, 2025
Collaborator Author