Thanks for introducing such excellent work following Infinity!
Is the text-to-image model currently running using weights from a publicly released video model?
Could you share a demo script for text-to-image inference?
Also, I’m interested in the parameter settings used for T2I evaluation.