Official Code for Towards Micro-video Understanding by Joint Sequential-Sparse Modeling

Introduction

Like the traditional long videos, micro-videos are the unity of textual, acoustic, and visual modalities. These modalities sequentially tell a real-life event from distinct angles. Yet, unlike the traditional long videos with rich content, micro-videos are very short, lasting for 6-15 seconds, and they hence usually convey one or a few high-level concepts. In the light of this, we have to characterize and jointly model the sparseness and multiple sequential structures for better micro-video understanding. To accomplish this, in this paper, we present an end-to-end deep learning model, which packs three parallel LSTMs to capture the sequential structures and a convolutional neural network to learn the sparse concept-level representations of micro-videos. We applied our model to the application of micro-video categorization. Besides, we constructed a real-world dataset for sequence modeling and released it to facilitate other researchers. Experimental results demonstrate that our model yields better performance than several state-of-the-art baselines.

Links

Paper: ACM MM
Code Download: Baidu Netdisk

The code for several baseline methods:

Code Download: LSTM+DL
Code Download: LSTMs
Code Download: CONV
Code Download: TMDL
Code Download: DMDL
Code Download: TRUMANN

Dataset

Sequence dataset: It is designed for deep learning models based on an LSTM network.

Code Download: Visual Modality
Code Download: Audio Modality
Code Download: Text Modality

Non-sequence dataset: It is generated by averaging all the sequences of each modality. It contains 4096-D visual feature, 512-D audio feature, and 100-D text feature.

Code Download: Feature
Code Download: Label

Method Overview

Results

Our method achieves competitive or superior results compared with previous methods on multiple benchmarks.

License

This program is licensed under the GNU General Public License v3.0.
You may obtain a copy of the license at:
https://www.gnu.org/licenses/gpl-3.0.html

Any derivative work based on this program must also be licensed under the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version, if such derivative work is distributed to a third party.

The copyright of this program is owned by Shandong University.

For commercial projects that require distributing this code as part of a program that cannot be released under the GNU General Public License, please contact mengliu.sdu@gmail.com to obtain a commercial license.

Citation

If you find this project useful in your research, please consider citing:

@inproceedings{10.1145/3123266.3123341,
author = {Liu, Meng and Nie, Liqiang and Wang, Meng and Chen, Baoquan},
title = {Towards Micro-video Understanding by Joint Sequential-Sparse Modeling},
year = {2017},
booktitle = {Proceedings of the 25th ACM International Conference on Multimedia},
pages = {970–978}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
framework.jpg		framework.jpg
results.jpg		results.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Official Code for Towards Micro-video Understanding by Joint Sequential-Sparse Modeling

Introduction

Links

Dataset

Method Overview

Results

License

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Official Code for Towards Micro-video Understanding by Joint Sequential-Sparse Modeling

Introduction

Links

Dataset

Method Overview

Results

License

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages