Migration to New Tensor + TensorLayout #14364
TT-BrianLiu
announced in
General announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Background
Our current way of creating tensors on device has two main drawbacks:
tt::tt_metal::LegacyShapeandttnn::Shape, both of which contains information about logical shape and padded shape. In our stack, we mainly treat the padded shape as the actual logical shape and the logical shape as metadata, which is sometimes respected, sometimes not. There is no clear definition of what a created tensor represents.bfloat8tensor be inrow_majorlayout). Storing padding in shape and not having all these concepts consolidated makes it difficult to use and a nightmare to maintain.New Tensor + TensorLayout
We should be thinking about tensor creation in terms of logical shape + concepts like sharding, dtype, layout, and memory config. To this end, we are introducing two new concepts, which we will slowly move the codebase over to use:
tt::tt_metal::LegacyShapeandttnn::Shapewithttnn::SimpleShapeand rename back tottnn::Shape.Layoutbecomes a bit overloaded, since we have newTensorLayout(described above) and oldLayout(ie. just tiny tiles, regular 32x32 tiles, or row_major). We namedTensorLayoutdifferently fromLayoutfor now, but there could be more appropriate names for them in the future (eg.MemoryLayout/Layout+PageConfig).alignmentis just an easier way for users to specify padding. For example, majority of padding is to pad up to nearest 32 to make height and width of a tensor tilizable (eg. 2 padded to 32 with 30 padding, 33 padded to 64 with 31 padding). Instead of user specifying the exact padding needed, it is far more convenient to say: "Please align these dims to the nearest 32". (eg. 2 padded to 32 or 33 padded to 64 are both just alignment of 32). It more accurately captures what you are trying to do.Sharding + Alignment
An interesting point is how sharding interacts with alignment and it largely boils down to: "Does alignment happen before or after sharding?" and we have decided that it MUST happen after sharding:
A side note (if you want to work through some examples of alignment):
Tensor Creation
With new
TensorLayout, this is how we envision users to create tensors:TensorLayout.aligned_shard_height * number_of_shards_along_heightaligned_shard_width * number_of_shards_along_widthphysical_height*physical_widthExample in Resnet
Let's go through an example in Resnet where a user has an logical shape
[56, 56, 30]that they want to height shard into 64 pieces of[49, 30]shards and the tensor is in TILE layout (meaning each shard is tilized).[56, 56, 30][3136, 30]for sharding[49, 30]cuts logical 2D shape into 64 pieces (no partial shards at the end here)[49, 30]is aligned to[64, 32]. This alignment can be user provided or automatically inferred based onTILElayout (assuming full32x32tiles here).Implications
With this new flow, we will be able to support alignment of shards, but it also has some important implications on what you CANNOT do now:
[56, 56, 30]-> cut into 64 x[49, 30]shards, then padded up to 64 x[64, 32]has a 2D physical shape of 4096 x 32 but has no 3D representation. The closest representation could be something like a reshape into[64, 49, 30]->[64, 49[64], 30[32]], but you are talking about a different tensor here.[1, 4[32], 1[32], 32]is not supported because you cannot describe this as a logical shape + some 2D alignment.transpose_hcand input is[1, 1[32], 4[32], 32], the output would request a tensor like[1, 4[32], 1[32], 32], because the OP fundamentally treats the input as[1, 32, 32, 32]and the actual logical shape is essentially just meta-data... Ignoring how inconsistent this is, the performance is sub-optimal to begin with, since we are doing unnecessary copies and re-tilize to produce data that we do not even need. Today, we actually end up stripping the extra padding along the C after the op anyways.By imposing these restrictions, we hope to slowly transition our stack to only work with logical shape, which is the logical (pun intended 💩) thing to do. If an OP does not work with original logical shape, then it does not work. Simple as that.
@TT-BrianLiu @ayerofieiev-tt @sminakov-tt On our side, we will migrate tensor infra to new
TensorLayout, but on OPs side, we will start requesting ops to transition to use logical shape. A very trackable outcome (but probably not so easily achieved) is to remove all references to padded shape in the form of:get_padded_shape():tt::tt_metal::LegacyShapeorttnn::Shape(note: default accessor fromtt::tt_metal::LegacyShapereturns you padded dim):Beta Was this translation helpful? Give feedback.
All reactions