I am delighted to read this paper, which provides great insights for accelerated inference. The Turbo-VAED trained with several strategies in your experiments is only trained for video encoding and decoding, and it is not involved in the fine-tuning of the entire video models (e.g., the WAN series), right? Then why can it be directly reused in these models? Would this not lead to incompatibility issues?