Skip to content

I got different results using thop and torchinfo #312

Open
@zz-2024

Description

Describe the bug
my code:

import torch 
from PIL import Image
from thop import profile, clever_format
from torchinfo import summary
import cn_clip.clip as clip
from cn_clip.clip import load_from_name, available_models
print("Available models:", available_models())  
# Available models: ['ViT-B-16', 'ViT-L-14', 'ViT-L-14-336', 'ViT-H-14', 'RN50']

device = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = load_from_name("RN50", device=device, download_root='./')
model.eval()
image = preprocess(Image.open("examples/pokemon.jpeg")).unsqueeze(0).to(device)
text = clip.tokenize(["杰尼龟", "妙蛙种子", "小火龙", "皮卡丘"]).to(device)
print(f"device:{device}")
with torch.no_grad():
    image_features = model.encode_image(image)
    text_features = model.encode_text(text)
    # 对特征进行归一化,请使用归一化后的图文特征用于下游任务
    image_features /= image_features.norm(dim=-1, keepdim=True) 
    text_features /= text_features.norm(dim=-1, keepdim=True)    

    logits_per_image, logits_per_text = model.get_similarity(image, text)
    probs = logits_per_image.softmax(dim=-1).cpu().numpy()

print("Label probs:", probs)  # [[1.268734e-03 5.436878e-02 6.795761e-04 9.436829e-01]]

# 计算FLOPs
print(f"image:{image.shape}, text:{text.shape}")
flops, params = profile(model, inputs=(image, text), verbose=False)
flops, params = clever_format([flops, params], "%.3f")
print(f"Model FLOPs: {flops}, Model Params:{params}")
print("========seperate==========")
flops, params = profile(model.visual, inputs=(image, ), verbose=False)
flops, params = clever_format([flops, params], "%.3f")
print(f"Model FLOPs: {flops}, Model Params:{params}")
print(f"=========another method=============")
summary(model, input_data=[image, text])

my results:

Model FLOPs: 9.839G, Model Params:44.792M
========seperate==========
Model FLOPs: 5.418G, Model Params:23.527M
=========another method=============
=========================================================================================================
Layer (type:depth-idx)                                  Output Shape              Param #
=========================================================================================================
CLIP                                                    [1, 1024]                 786,433
├─ModifiedResNet: 1-1                                   [1, 1024]                 --
│    └─Conv2d: 2-1                                      [1, 32, 112, 112]         864
│    └─BatchNorm2d: 2-2                                 [1, 32, 112, 112]         64
│    └─ReLU: 2-3                                        [1, 32, 112, 112]         --
│    └─Conv2d: 2-4                                      [1, 32, 112, 112]         9,216
│    └─BatchNorm2d: 2-5                                 [1, 32, 112, 112]         64
│    └─ReLU: 2-6                                        [1, 32, 112, 112]         --
│    └─Conv2d: 2-7                                      [1, 64, 112, 112]         18,432
│    └─BatchNorm2d: 2-8                                 [1, 64, 112, 112]         128
│    └─ReLU: 2-9                                        [1, 64, 112, 112]         --
│    └─AvgPool2d: 2-10                                  [1, 64, 56, 56]           --
│    └─Sequential: 2-11                                 [1, 256, 56, 56]          --
│    │    └─Bottleneck: 3-1                             [1, 256, 56, 56]          75,008
│    │    └─Bottleneck: 3-2                             [1, 256, 56, 56]          70,400
│    │    └─Bottleneck: 3-3                             [1, 256, 56, 56]          70,400
│    └─Sequential: 2-12                                 [1, 512, 28, 28]          --
│    │    └─Bottleneck: 3-4                             [1, 512, 28, 28]          379,392
│    │    └─Bottleneck: 3-5                             [1, 512, 28, 28]          280,064
│    │    └─Bottleneck: 3-6                             [1, 512, 28, 28]          280,064
│    │    └─Bottleneck: 3-7                             [1, 512, 28, 28]          280,064
│    └─Sequential: 2-13                                 [1, 1024, 14, 14]         --
│    │    └─Bottleneck: 3-8                             [1, 1024, 14, 14]         1,512,448
│    │    └─Bottleneck: 3-9                             [1, 1024, 14, 14]         1,117,184
│    │    └─Bottleneck: 3-10                            [1, 1024, 14, 14]         1,117,184
│    │    └─Bottleneck: 3-11                            [1, 1024, 14, 14]         1,117,184
│    │    └─Bottleneck: 3-12                            [1, 1024, 14, 14]         1,117,184
│    │    └─Bottleneck: 3-13                            [1, 1024, 14, 14]         1,117,184
│    └─Sequential: 2-14                                 [1, 2048, 7, 7]           --
│    │    └─Bottleneck: 3-14                            [1, 2048, 7, 7]           6,039,552
│    │    └─Bottleneck: 3-15                            [1, 2048, 7, 7]           4,462,592
│    │    └─Bottleneck: 3-16                            [1, 2048, 7, 7]           4,462,592
│    └─AttentionPool2d: 2-15                            [1, 1024]                 14,789,632
├─BertModel: 1-2                                        [4, 52, 768]              --
│    └─BertEmbeddings: 2-16                             [4, 52, 768]              --
│    │    └─Embedding: 3-17                             [4, 52, 768]              16,226,304
│    │    └─Embedding: 3-18                             [4, 52, 768]              393,216
│    │    └─Embedding: 3-19                             [4, 52, 768]              1,536
│    │    └─LayerNorm: 3-20                             [4, 52, 768]              1,536
│    │    └─Dropout: 3-21                               [4, 52, 768]              --
│    └─BertEncoder: 2-17                                [4, 52, 768]              --
│    │    └─ModuleList: 3-22                            --                        21,263,616
=========================================================================================================
Total params: 76,989,537
Trainable params: 76,989,537
Non-trainable params: 0
Total mult-adds (G): 5.52
=========================================================================================================
Input size (MB): 0.60
Forward/backward pass size (MB): 123.19
Params size (MB): 122.93
Estimated Total Size (MB): 246.73
=========================================================================================================

Why the number of parameters varies so much? (44.792M VS 77M)
Why are mult-adds such different? (9.839G VS 5.52G)

To Reproduce
Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Smartphone (please complete the following information):

  • Device: [e.g. iPhone6]
  • OS: [e.g. iOS8.1]
  • Browser [e.g. stock browser, safari]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions