Skip to content

question about Extended Data Fig. 4: ROI classification across different image resolutions. #66

@yuxiaokang-source

Description

@yuxiaokang-source

Great work. I want to use the UNI foundation model in my own project. Now, I have a question about the input size for UNI.
Image

As you showed in the Extended Data Fig. 4, we can choose different resolutions (like 224x224, 448x448, and so on) for UNI's input. I want to know how to input 448*448 image tiles. Should I use the following code directly?

transform = transforms.Compose(
 [
  transforms.Resize(224),
  transforms.CenterCrop(224),
  transforms.ToTensor(),
  transforms.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
 ]
)

In other words, 448*448 image tiles will generate 784 patch tokens or 196 patch tokens? Could you give me suggestions?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions