Skip to content

Unexpected "UNK" captions with single video prediction #62

@MarcosRodrigoT

Description

@MarcosRodrigoT

Hello, Vladimir.

First of all congratulations for such a fantastic project. I was introduced to this work from many other papers who cited it and used it as a base to grow upon. I enjoyed your video presentation, and I think you are doing a very good job at keeping up with all the repo issues.

I ran the sample code single_video_prediction.py on the given example (women_long_jump.mp4) without major issues (had to change CUDA and PyTorch versions from the conda environment as reported in #45).

However, when I tried the code on a custom video, let's call it my_video.mp4, I got some errors.

VGGish was unable to extract a .wav file from the audio because it had no aac codec (I checked with ffprobe my_video.mp4 and the audio used opus codec instead of aac). So, I changed these 2 lines in BMT/submodules/video_features/models/vggish/utils/utils.py for the following, which resolved the issue:

mp4_to_acc = f'{which_ffmpeg()} -hide_banner -loglevel panic -y -i {video_path} {audio_aac_path}'
aac_to_wav = f'{which_ffmpeg()} -hide_banner -loglevel panic -y -i {video_path} {audio_wav_path}'

After obtaining the i3d and vggish features I tried running BMT on the video using the following command:

python ./sample/single_video_prediction.py \
--prop_generator_model_path ./sample/best_prop_model.pt \
--pretrained_cap_model_path ./sample/best_cap_model.pt \
--vggish_features_path ./sample/my_video_vggish.npy \
--rgb_features_path ./sample/my_video_rgb.npy \
--flow_features_path ./sample/my_video_flow.npy \
--duration_in_secs 148.121 \
--device_id 0 \
--max_prop_per_vid 100 \
--nms_tiou_thresh 0.4

Obtaining:

Contructing caption_iterator for "train" phase
Using vanilla Generator
initialization: xavier
Glove emb of the same size as d_model_caps
Pretrained caption path:
 ./sample/best_cap_model.pt
Traceback (most recent call last):
  File "./sample/single_video_prediction.py", line 313, in <module>
    cap_model, feature_paths, train_dataset, cap_cfg, args.device_id, proposals, args.duration_in_secs
  File "./sample/single_video_prediction.py", line 219, in caption_proposals
    for start, end, conf in proposals.squeeze():
  File "/home/mrt/miniconda3/envs/bmt/lib/python3.7/site-packages/torch/tensor.py", line 456, in __iter__
    raise TypeError('iteration over a 0-d tensor')
TypeError: iteration over a 0-d tensor

Checking it was iterating over a 0-d tensor, I tried removing the NMS and ran it again with:

python ./sample/single_video_prediction.py \
--prop_generator_model_path ./sample/best_prop_model.pt \
--pretrained_cap_model_path ./sample/best_cap_model.pt \
--vggish_features_path ./sample/my_video_vggish.npy \
--rgb_features_path ./sample/my_video_rgb.npy \
--flow_features_path ./sample/my_video_flow.npy \
--duration_in_secs 148.121 \
--device_id 0 \
--max_prop_per_vid 100 \

Obtaining a list of sentences with the token "UNK":

Contructing caption_iterator for "train" phase
Using vanilla Generator
initialization: xavier
Glove emb of the same size as d_model_caps
Pretrained caption path:
 ./sample/best_cap_model.pt
[{'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}, {'start': 0.0, 'end': 148.1, 'sentence': ' unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk   unk '}]

I am a bit at a loss here, as I have not much experience working with text and audio (only with image and video). Could you point me in the right direction? I am unsure of what might be the root cause. I suspect it could be one of the following:

  • PyTorch version. I installed torch 1.4.0 instead of 1.2.0, as if was the closest version that could work with my GPU. I kept torchtext at version 0.3.1 (same as in yours). However, the code works for the example video you provide, so it seems unlikely that this is the root cause.
  • VGGish features. As I described above, I changed one script to be able to extract a .wav file directly from the .mp4, skipping the intermediate step of obtaining an .aac file. I do not see any inconvenient in doing so, in fact, it seems like a more portable option. However, I remain unsure whether you did this for a specific reason I am unaware of.

Desktop (please complete the following information):

  • OS: Ubuntu 22.04
  • GPU: NVidia RTX 4090 24GB

You conda environment

# packages in environment at /home/mrt/miniconda3/envs/bmt:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main    conda-forge
_pytorch_select           0.2                       gpu_0    anaconda
absl-py                   0.8.1                    py37_0    conda-forge
asn1crypto                1.3.0                    py37_0    conda-forge
blas                      1.0                         mkl    conda-forge
ca-certificates           2020.1.1                      0    anaconda
certifi                   2020.4.5.1               py37_0    anaconda
cffi                      1.14.0           py37h2e261b9_0    anaconda
chardet                   3.0.4                 py37_1003    conda-forge
cryptography              2.8              py37h1ba5d50_0    anaconda
cudatoolkit               10.1.243             h6bb024c_0    anaconda
cudnn                     7.6.5.32             hc0a50b0_1    conda-forge
cymem                     1.31.2           py37h6bb024c_0    anaconda
cytoolz                   0.9.0.1          py37h14c3975_1    anaconda
dill                      0.2.9                    py37_0    conda-forge
en-core-web-sm            2.0.0                    pypi_0    pypi
future                    0.17.1                   py37_0    anaconda
idna                      2.9                        py_1    conda-forge
intel-openmp              2020.0                      166    anaconda
joblib                    0.14.1                     py_0    conda-forge
ld_impl_linux-64          2.33.1               h53a641e_7    conda-forge
libedit                   3.1.20181209         hc058e9b_0    anaconda
libffi                    3.2.1                hd88cf55_4
libgcc-ng                 9.1.0                hdf63c60_0    anaconda
libgfortran-ng            7.3.0                hdf63c60_0    anaconda
libprotobuf               3.11.4               h8b12597_0    conda-forge
libstdcxx-ng              9.1.0                hdf63c60_0    anaconda
markdown                  3.2.1                      py_0    conda-forge
mkl                       2020.0                      166    anaconda
mkl-service               2.3.0            py37he904b0f_0
mkl_fft                   1.0.15           py37ha843d7b_0
mkl_random                1.1.0            py37hd6b4f25_0
msgpack-numpy             0.4.4.3                    py_0    conda-forge
msgpack-python            0.5.6            py37h6bb024c_1    anaconda
murmurhash                0.28.0           py37hf484d3e_0    anaconda
ncurses                   6.2                  he6710b0_1    anaconda
ninja                     1.9.0            py37hfd86e86_0    anaconda
numpy                     1.15.4           py37h7e9f1db_0
numpy-base                1.15.4           py37hde5b4d6_0
openjdk                   8.0.152              h7b6447c_3    anaconda
openssl                   1.1.1g               h7b6447c_0    anaconda
pandas                    0.24.2           py37he6710b0_0    anaconda
pip                       20.0.2                   py37_1    conda-forge
plac                      0.9.6                    py37_0    anaconda
preshed                   1.0.1            py37he6710b0_0    anaconda
protobuf                  3.11.4           py37h3340039_1    conda-forge
pycparser                 2.20                       py_0    conda-forge
pyopenssl                 19.1.0                   py37_0    conda-forge
pysocks                   1.7.1                    py37_0    conda-forge
python                    3.7.7           hcf32534_0_cpython    anaconda
python-dateutil           2.8.1                      py_0    conda-forge
python_abi                3.7                     1_cp37m    conda-forge
pytorch                   1.4.0           cuda101py37h02f0884_0
pytz                      2020.1                     py_0    anaconda
readline                  8.0                  h7b6447c_0    anaconda
regex                     2018.07.11       py37h14c3975_0    anaconda
requests                  2.23.0                   py37_0    conda-forge
scikit-learn              0.22.1           py37hd81dba3_0
scipy                     1.3.1            py37h7c811a0_0
setuptools                46.1.3                   py37_0    anaconda
six                       1.14.0                   py37_0    conda-forge
spacy                     2.0.12           py37h962f231_0    anaconda
sqlite                    3.31.1               h62c20be_1    anaconda
tensorboard               1.14.0                   py37_0    conda-forge
termcolor                 1.1.0                    py37_1    anaconda
thinc                     6.10.3           py37h962f231_0    anaconda
tk                        8.6.8                hbc83047_0    anaconda
toolz                     0.10.0                     py_0    conda-forge
torchtext                 0.3.1                    pypi_0    pypi
tqdm                      4.46.0                     py_0    anaconda
ujson                     2.0.3            py37he6710b0_0    anaconda
urllib3                   1.25.8                   py37_0    anaconda
werkzeug                  1.0.1              pyh9f0ad1d_0    conda-forge
wheel                     0.34.2                   py37_0    conda-forge
wrapt                     1.10.11          py37h14c3975_2    anaconda
xz                        5.2.5                h7b6447c_0    anaconda
zlib                      1.2.11               h7b6447c_3    anaconda

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions