Seems that you directly train a video-generation model rather than using distillation?

Hi, according to the code, it seems that you directly train a video-generation model from open-source models rather than using distillation? I see that just a model exists in the training process. No freezed teacher model and weights exists. Is that right? Thanks!