Changing model in text-to-video retrieval tutorial leads to poor performance #2493
Unanswered
yiling-chen
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi all,
I am wondering if anyone tried this tutorial and replaced the CLIP4clip model with other models supported by towhee?
https://codelabs.towhee.io/how-to-build-a-text-video-retrieval-engine/index
By following the tutorial, I can reproduce the metrics on MSR-VTT by using CLIP4clip.
However, when I tried the FrozenInTime and BridgeFormer models, I could only got 0.007 and 0.003 Recall@1.
Compared to 0.421 of CLIP4clip, it is obviously not the right number.
I didn't make many changes to the code. For both FrozenInTime and BridgeFormer, I had to change the embed dim to 256. I also modified the video decoding by referring to the example code of the operator. The remaining code remained the same.
For your reference,
FrozenInTime
dc = ( towhee.read_csv(test_sample_csv_path) .runas_op['video_id', 'id'](func=lambda x: int(x[-4:])) .video_decode.ffmpeg['video_path', 'frames'](sample_type='uniform_temporal_subsample', args={'num_samples': 4}) .runas_op['frames', 'frames'](func=lambda x: [y for y in x]) .video_text_embedding.frozen_in_time['frames', 'vec'](model_name='frozen_in_time_base_16_244', modality='video', device=device) .to_milvus['id', 'vec'](collection=collection, batch=30) )
BridgeFormer
dc = ( towhee.read_csv(test_sample_csv_path) .runas_op['video_id', 'id'](func=lambda x: int(x[-4:])) .video_decode.ffmpeg['video_path', 'frames']() .runas_op['frames', 'frames'](func=lambda x: [y for y in x]) .video_text_embedding.bridge_former['frames', 'vec'](model_name='frozen_model', modality='video') .to_milvus['id', 'vec'](collection=collection, batch=30) )
I am wondering if anyone who had experiences in these operators can provide any insights?
Thanks.
Beta Was this translation helpful? Give feedback.
All reactions