Adaptive Token Pruning explain

Hi, Can you explain how your ATP dynamically scale transformer'width i.e number of tokens as you stated in your paper? From your source code in core/model/transformer, I think it can only scale number of transformer layers while numberr of tokens in each sequence still the same.