RuntimeError: Tensors must have same number of dimensions: got 2 and 1

Hi, there is an error when I tried to run Inference.py. I think there is just  a dimension problem.
However, I didn't understand line 46 and line 49 in Inference.py since the audio_file.shape[0] of the wav file will be "1". 
And then you do  zero padding if "audio_file.shape[0] < (SAMPLE_RATE * set_length)".
I cannot understand what you are handling with this part. Can you explain it?


The bug I see is as the following.
root@c4f3aefb5d65:/workspace/Prefix_AAC_ICASSP2023# python3 Inference.py 2 1 ./AudioCaps/test/c3nlaAkv9bA.wav
/opt/conda/lib/python3.10/site-packages/torchvision/datapoints/__init__.py:12: UserWarning: The torchvision.datapoints and torchvision.transforms.v2 namespaces are still Beta. While we do not expect major breaking changes, some APIs may still change according to user feedback. Please submit any feedback you may have in this issue: https://github.com/pytorch/vision/issues/6753, and you can also check out https://github.com/pytorch/vision/issues/7319 to learn more about the APIs that we suspect might involve future changes. You can silence this warning by calling torchvision.disable_beta_transforms_warning().
  warnings.warn(_BETA_TRANSFORMS_WARNING)
/opt/conda/lib/python3.10/site-packages/torchvision/transforms/v2/__init__.py:54: UserWarning: The torchvision.datapoints and torchvision.transforms.v2 namespaces are still Beta. While we do not expect major breaking changes, some APIs may still change according to user feedback. Please submit any feedback you may have in this issue: https://github.com/pytorch/vision/issues/6753, and you can also check out https://github.com/pytorch/vision/issues/7319 to learn more about the APIs that we suspect might involve future changes. You can silence this warning by calling torchvision.disable_beta_transforms_warning().
  warnings.warn(_BETA_TRANSFORMS_WARNING)
/opt/conda/lib/python3.10/site-packages/torchlibrosa/stft.py:686: UserWarning: Empty filters detected in mel frequency basis. Some channels will produce empty responses. Try increasing your sampling rate (and fmax) or reducing n_mels.
  self.melW = librosa.filters.mel(sr=sr, n_fft=n_fft, n_mels=n_mels,
use GPT2 Tokenizer
temporal feature ver's mapping network : num_head = 8 num_layers = 4 prefix_vector_length = 15
global feature ver's mapping network : num_head = 8 num_layers = 4 prefix_vector_length = 11
Encoder freezing
GPT2 freezing
header trainable!
Traceback (most recent call last):
  File "/workspace/Prefix_AAC_ICASSP2023/Inference.py", line 63, in <module>
    audio_file = torch.cat((audio_file, pad_val), dim=0)
RuntimeError: Tensors must have same number of dimensions: got 2 and 1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: Tensors must have same number of dimensions: got 2 and 1 #10

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

RuntimeError: Tensors must have same number of dimensions: got 2 and 1 #10

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions