-
Notifications
You must be signed in to change notification settings - Fork 347
Description
I am creating this issue as a single thread to understand how to pick the right parameters for ttnn.group_norm operation. We (tt-forge) are trying to add ttnn.group_norm operation to the tt-forge compiler in the form of TTNN and TTIR dialects in tt-mlir. The tricky part is picking the right grid size for a given input tensor.
Problem symptoms
For a given tensor, in these examples (N, 1, H*W, C) = (1, 8 * 8, 480) If grid_size for a given tensor is too big (e.g. 8x8 grid in gn-repro1.py, then division by zero happens (gn-repro1.log). On the other hand, for the same input if the grid is too small (e.g. 1x1 grid in gn-repro2.py), then L1 OOM happens because one Tensix has insufficient L1 to fit the input (gn-repro2.log).
Choosing the grid size
Any advice for how to choose the grid size for a given input tensor would be much appreciated! The ideal scenario would be that the user of ttnn.group_norm doesn't have to think about optimal grid placement, but in the meantime, any heuristic for choosing grid size is very useful.
Choosing num_out_blocks
Additionally, num_out_blocks parameter is 1 by default. It seems it also needs some hand tuning, so any advice on this is useful too.
gn-error1.log
gn-error2.log
gn-repro1.py
gn-repro2.py
)