Flip default tensor retention on graph inputs and outputs from False to True in MLIR runtime #5465
Replies: 4 comments
-
(current) retain=False on graph inputsRetain is currently default-set to false on inputs, because we want to rely on ttnn deallocator to auto-deallocate inputs after they are not used to prevent device OOM. tt-xla currently ignores this logic and sets retain=True on all input tensors to prevent ttnn from eagerly deallocating them. This is because:
|
Beta Was this translation helpful? Give feedback.
-
(change) retain=True on graph inputsThis would allow the FE to not have to call |
Beta Was this translation helpful? Give feedback.
-
(current) retain=False on graph outputsPreviously, all FEs were immediately deallocating all output tensors after submit. For tt-xla, this was because we eagerly transferred output tensors toHost, which is not performant, but a workaround for a separate issue. This behaviour will be removed in tenstorrent/tt-xla#1657. Now, calls to This also means that tensor transfers toHost from the frontend are unbuffered on host and repeatedly toHost()'ed on demand. These could be buffered on host, but some mutation-tracking infra may be required in the frontend. Some outputs (like static caches) must participate in compute graphs as input and output, but should not be returned to host. |
Beta Was this translation helpful? Give feedback.
-
(change) retain=True on graph outputstt-xla does not have logic to deallocate output tensors intelligently, except for when BufferInstances are destructed. While we don't set retain=True on graph outputs (meaning their retention is false), they are also not auto-deallocated by tt-xla, since the deallocator only runs when the tensor is input in another graph. However, if the output tensor is reused in another graph, it will have its retain flag set to true because of reasons detailed in previous comment. So, it won't get auto-deallocated by tt-xla either. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
There are 4 possibilities here ->
(set retain = true, set retain = false) x (on input tensors, on output tensors)and I will try to detail implications from the frontends and historical reasons for the current defaultfalseretention on both based on expertise from @jnie-TT and @pilkicTT.Beta Was this translation helpful? Give feedback.
All reactions