Thanks for your work!
I would like to inquire about the possibility of optimising the inference speed and GPU memory usage of the model.
I saw in previous issues that you are considering it, but have seen no updates.
Is it possible? What should be the way to proceed to reduce it (a lot)?
cheers