The linear projection layer between the Visual Encoder and the face tokens does not seem to be reflected in the SkyReelsA1ImagePoseToVideoPipeline class. Did you define a new linear projection layer in this intermediate step, or directly fine-tune some of the linear layers within the Visual Encoder?
The linear projection layer between the Visual Encoder and the face tokens does not seem to be reflected in the SkyReelsA1ImagePoseToVideoPipeline class. Did you define a new linear projection layer in this intermediate step, or directly fine-tune some of the linear layers within the Visual Encoder?