Where is the linear projection reflected in the code?

The linear projection layer between the Visual Encoder and the face tokens does not seem to be reflected in the SkyReelsA1ImagePoseToVideoPipeline class. Did you define a new linear projection layer in this intermediate step, or directly fine-tune some of the linear layers within the Visual Encoder?