question about Extended Data Fig. 4: ROI classification across different image resolutions.

Great work. I want to use the UNI foundation model in my own project. Now, I have a question about the input size for UNI.
<img width="1055" alt="Image" src="https://github.com/user-attachments/assets/e272087f-0b5c-4c17-8f32-241d1cd1fdd0" />

As you showed in the Extended Data Fig. 4, we can choose different resolutions (like 224x224, 448x448, and so on) for UNI's input. I want to know how to input 448*448 image tiles. Should I use the following code directly?
```
transform = transforms.Compose(
 [
  transforms.Resize(224),
  transforms.CenterCrop(224),
  transforms.ToTensor(),
  transforms.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
 ]
)
```
In other words, 448*448 image tiles will generate 784 patch tokens or 196 patch tokens? Could you give me suggestions?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

question about Extended Data Fig. 4: ROI classification across different image resolutions. #66

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

question about Extended Data Fig. 4: ROI classification across different image resolutions. #66

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions