Official Code for Cross-modal Moment Localization in Videos

Introduction

In this paper, we address the temporal moment localization issue, namely, localizing a video moment described by a natural language query in an untrimmed video. This is a general yet challenging vision-language task since it requires not only the localization of moments, but also the multimodal comprehension of textual-temporal information (e.g., "first" and "leaving") that helps to distinguish the desired moment from the others, especially those with similar visual content. While existing studies treat the given language queries as a single unit, we propose to decompose them into two components: the relevant cue related to the desired moment localization and the irrelevant one meaningless to the localization. This allows us to flexibly adapt to arbitrary queries in an end-to-end framework. In our proposed model, a language-temporal attention network is utilized to learn the word attention based on the temporal context information in the video. Therefore, our model can automatically select "what words to listen to" for localizing the desired moment. We evaluate the proposed model on two public benchmark datasets: DiDeMo and Charades-STA. The experimental results verify its superiority over several state-of-the-art methods.

Links

Paper: ACM MM
Code Download: Baidu Netdisk
Extraction Code: k9ew

Method Overview

Results

Our method achieves competitive or superior results compared with previous methods on multiple benchmarks.

License

This program is licensed under the GNU General Public License v3.0.
You may obtain a copy of the license at:
https://www.gnu.org/licenses/gpl-3.0.html

Any derivative work based on this program must also be licensed under the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version, if such derivative work is distributed to a third party.

The copyright of this program is owned by Shandong University.

For commercial projects that require distributing this code as part of a program that cannot be released under the GNU General Public License, please contact mengliu.sdu@gmail.com to obtain a commercial license.

Citation

If you find this project useful in your research, please consider citing:

@inproceedings{10.1145/3240508.3240549,
author = {Liu, Meng and Wang, Xiang and Nie, Liqiang and Tian, Qi and Chen, Baoquan and Chua, Tat-Seng},
title = {Cross-modal Moment Localization in Videos},
year = {2018},
booktitle = {Proceedings of the 26th ACM International Conference on Multimedia},
pages = {843–851}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
framework.jpg		framework.jpg
results.jpg		results.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Official Code for Cross-modal Moment Localization in Videos

Introduction

Links

Method Overview

Results

License

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Official Code for Cross-modal Moment Localization in Videos

Introduction

Links

Method Overview

Results

License

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages