SpokenNLP: The official repository for codebases on a wide variety of research projects developed by the SpokenNLP team of Speech Lab, Alibaba Group.
- [2025-05-16]:
MMVTSwas accepted by ACL 2025 Findings. It introduces a dataset for topic segmentation of Chinese lecture videos and propose a multimodal video topic segmentation model that integrates text, visual, and audio information to determine the topic boundaries. - [2024-02-05]:
SLDwas accepted by ICASSP 2024. It introduces SLD: a novel approach which applies a KL divergence loss with smoothed labels on speech tokens for Discrete-token-based ASR. - [2023-10-23]:
Dittowas accepted by EMNLP 2023. It introduces Ditto: a learning-free approach that uses model-based importance estimations to weight words and compute sentence embeddings from pre-trained model representations. - [2023-10-07]:
Improving Long Document Topic Segmentation Models With Enhanced Coherence Modelingwas accepted by EMNLP 2023. It enhances the pretrained language model’s ability to capture coherence from both structure and similarity perspectives to further improve the topic segmentation performance. - [2023-05-22]:
PoNetare submitted to huggingface hub. PoNet can now be used directly through the Transformers library. - [2022-12-02]:
alimeeting4mugreleased the official baseline system codebase for ICASSP2023 General Meeting Understanding and Generation Challenge (MUG)! - [2022-02-24]:
MDERankwas accepted by Findings of ACL 2022. It is a Masked Document Embedding Rank approach for unsupervised keyphrase extraction, which outperforms state-of-the-art unsupervised keyphrase extraction approaches, especially on long documents. - [2022-01-21]:
PoNetwas accepted by ICLR 2022. It is a novel Pooling Network (PoNet) for token mixing in long sequences with linear complexity, which achieves a good balance between transfer learning capability and accuracy and complexity for long sequence modeling. Models are released at Modelscope (English and Chinese). - [2021-09-11]:
SeqModelwas accepted by IEEE ASRU 2021. It is a sequence model with self-adaptive sliding window for efficient spoken document segmentation. A new Chinese Wikipedia-based document segmentation dataset Wiki-zh was released. Models are released at Modelscope (English and Chinese). - [2019-02-28]:
JointBERTwas proposed for joint intent classification and slot filling with BERT. The third-party PyTorch implementation of JointBERT is available. - [2018-10-17]:
ESIMranks the top on both datasets on DSTC7 Noetic End-to-end Response Selection track !
SpokenNLP is released under the Apache License 2.0. This project contains various third-party components under other open source licenses.