Hi authors,
First of all, thank you for your fantastic work on TokenFlow! I really appreciate the idea of enforcing feature consistency in video editing without requiring fine-tuning. Your approach is both elegant and effective. 🚀
I would love to apply TokenFlow to Stable Video Diffusion (SVD) for improving temporal consistency in generated videos. However, I encountered some challenges when directly integrating it with SVD's UNet3DConditionSVDModel.
Do you have any insights on whether TokenFlow could be adapted to work with Stable Video Diffusion? If so, do you have any recommendations or potential modifications to make it compatible?
Any guidance would be greatly appreciated. Thanks again for your incredible contribution to the field!