GPU memory during inference

Thanks for your work!

I would like to inquire about the possibility of optimising the inference speed and GPU memory usage of the model.

I saw in previous issues that you are considering it, but have seen no updates.

Is it possible? What should be the way to proceed to reduce it (a lot)? 

cheers