In et_replay, when a tensor memory is allocated, it is based on its tensor id. However, the tensors with different tensor id may refer to the same memory storage. In Ads production model, we saw et_replay ran out of GPU memory while the original workload is ok.
This request is to improve tensor memory allocation based on its storage id to improve memory allocation efficiency.