Skip to content

RuntimeError: Tensors must have same number of dimensions: got 2 and 1 #10

@yachuchi

Description

@yachuchi

Hi, there is an error when I tried to run Inference.py. I think there is just a dimension problem.
However, I didn't understand line 46 and line 49 in Inference.py since the audio_file.shape[0] of the wav file will be "1".
And then you do zero padding if "audio_file.shape[0] < (SAMPLE_RATE * set_length)".
I cannot understand what you are handling with this part. Can you explain it?

The bug I see is as the following.
root@c4f3aefb5d65:/workspace/Prefix_AAC_ICASSP2023# python3 Inference.py 2 1 ./AudioCaps/test/c3nlaAkv9bA.wav
/opt/conda/lib/python3.10/site-packages/torchvision/datapoints/init.py:12: UserWarning: The torchvision.datapoints and torchvision.transforms.v2 namespaces are still Beta. While we do not expect major breaking changes, some APIs may still change according to user feedback. Please submit any feedback you may have in this issue: pytorch/vision#6753, and you can also check out pytorch/vision#7319 to learn more about the APIs that we suspect might involve future changes. You can silence this warning by calling torchvision.disable_beta_transforms_warning().
warnings.warn(_BETA_TRANSFORMS_WARNING)
/opt/conda/lib/python3.10/site-packages/torchvision/transforms/v2/init.py:54: UserWarning: The torchvision.datapoints and torchvision.transforms.v2 namespaces are still Beta. While we do not expect major breaking changes, some APIs may still change according to user feedback. Please submit any feedback you may have in this issue: pytorch/vision#6753, and you can also check out pytorch/vision#7319 to learn more about the APIs that we suspect might involve future changes. You can silence this warning by calling torchvision.disable_beta_transforms_warning().
warnings.warn(_BETA_TRANSFORMS_WARNING)
/opt/conda/lib/python3.10/site-packages/torchlibrosa/stft.py:686: UserWarning: Empty filters detected in mel frequency basis. Some channels will produce empty responses. Try increasing your sampling rate (and fmax) or reducing n_mels.
self.melW = librosa.filters.mel(sr=sr, n_fft=n_fft, n_mels=n_mels,
use GPT2 Tokenizer
temporal feature ver's mapping network : num_head = 8 num_layers = 4 prefix_vector_length = 15
global feature ver's mapping network : num_head = 8 num_layers = 4 prefix_vector_length = 11
Encoder freezing
GPT2 freezing
header trainable!
Traceback (most recent call last):
File "/workspace/Prefix_AAC_ICASSP2023/Inference.py", line 63, in
audio_file = torch.cat((audio_file, pad_val), dim=0)
RuntimeError: Tensors must have same number of dimensions: got 2 and 1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions