Great work. I want to use the UNI foundation model in my own project. Now, I have a question about the input size for UNI.

As you showed in the Extended Data Fig. 4, we can choose different resolutions (like 224x224, 448x448, and so on) for UNI's input. I want to know how to input 448*448 image tiles. Should I use the following code directly?
transform = transforms.Compose(
[
transforms.Resize(224),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
]
)
In other words, 448*448 image tiles will generate 784 patch tokens or 196 patch tokens? Could you give me suggestions?