We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
I checked the shape of the input x and output feature of the CLIP VIT; it seems that it's still 336, not 448. Is there anything wrong?