Skip to content

mengliu1991/ACM-MM-2018-ROLE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Official Code for Cross-modal Moment Localization in Videos

Introduction

In this paper, we address the temporal moment localization issue, namely, localizing a video moment described by a natural language query in an untrimmed video. This is a general yet challenging vision-language task since it requires not only the localization of moments, but also the multimodal comprehension of textual-temporal information (e.g., "first" and "leaving") that helps to distinguish the desired moment from the others, especially those with similar visual content. While existing studies treat the given language queries as a single unit, we propose to decompose them into two components: the relevant cue related to the desired moment localization and the irrelevant one meaningless to the localization. This allows us to flexibly adapt to arbitrary queries in an end-to-end framework. In our proposed model, a language-temporal attention network is utilized to learn the word attention based on the temporal context information in the video. Therefore, our model can automatically select "what words to listen to" for localizing the desired moment. We evaluate the proposed model on two public benchmark datasets: DiDeMo and Charades-STA. The experimental results verify its superiority over several state-of-the-art methods.

Links

Method Overview

framework

Results

results

Our method achieves competitive or superior results compared with previous methods on multiple benchmarks.

License

Copyright (C) 2018 Shandong University

This program is licensed under the GNU General Public License v3.0.
You may obtain a copy of the license at:
https://www.gnu.org/licenses/gpl-3.0.html

Any derivative work based on this program must also be licensed under the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version, if such derivative work is distributed to a third party.

The copyright of this program is owned by Shandong University.

For commercial projects that require distributing this code as part of a program that cannot be released under the GNU General Public License, please contact mengliu.sdu@gmail.com to obtain a commercial license.

Citation

If you find this project useful in your research, please consider citing:

@inproceedings{10.1145/3240508.3240549,
author = {Liu, Meng and Wang, Xiang and Nie, Liqiang and Tian, Qi and Chen, Baoquan and Chua, Tat-Seng},
title = {Cross-modal Moment Localization in Videos},
year = {2018},
booktitle = {Proceedings of the 26th ACM International Conference on Multimedia},
pages = {843–851}
}

About

This the official code for Cross-modal Moment Localization in Videos.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors