GradSeg: Unsupervised Word Segmentation Using Temporal Gradient Pseudo-Labels (ICASSP 2023)

Tzeviya Sylvia Fuchs ([email protected])
Yedid Hoshen ([email protected])

GradSeg is an unsupervised approach for word segmentation using pretrained deep self-supervised features. It uses the temporal gradient magnitude of the embeddings (the distance between the embeddings of subsequent frames) to define psuedo-labels for word centers, and trains a linear classifier using these psuedo-lables. It then uses the classifier score to predict whether a frame is a word or a boundary.

If you find our work useful, please cite:

@inproceedings{fuchs23_icassp,
  author={Tzeviya Sylvia Fuchs and Yedid Hoshen},
  title={{Unsupervised Word Segmentation Using Temporal Gradient Pseudo-Labels}},
  year=2023,
  booktitle={ICASSP 2023},
}

Installation instructions

Python 3.8+
Pytorch 1.10.0
torchaudio 0.10.0
numpy
boltons

Download the code:

git clone https://github.com/MLSpeech/GradSeg.git

How to use

In this example, we will demonstrate how to run GradSeg on the Buckeye corpus.

We use the same experimental setup as in "DSegKNN: Unsupervised Word Segmentation using K Nearest Neighbors (INTERSPEECH 2022)" (Paper, Code, see README file there for data preprocessing).

Run grad_segmenter.py with the following options:

 python grad_segmenter.py --min_separation 3 
 			 --train_n 100 
 			 --eval_n -1 
 			 --reg 1e7 
 			 --target_perc 20 
 			 --frames_per_word 15
 			 --train_path datasets/buckeye_split/train/
 			 --val_path datasets/buckeye_split/val/

Result should be:

 Final result: 45.77675489067894 44.84274602637809 45.30493707647628 -2.040356216886474 53.62285114327586

which are the precision, recall, F-score, OS, and R-value.

For comparison, the evaluation script eval_segmentation.py used here is by Herman Kamper.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
data_loader.py		data_loader.py
eval_segmentation.py		eval_segmentation.py
grad_segmenter.py		grad_segmenter.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GradSeg: Unsupervised Word Segmentation Using Temporal Gradient Pseudo-Labels (ICASSP 2023)

Installation instructions

How to use

About

Releases

Packages

Contributors 2

Languages

License

MLSpeech/GradSeg

Folders and files

Latest commit

History

Repository files navigation

GradSeg: Unsupervised Word Segmentation Using Temporal Gradient Pseudo-Labels (ICASSP 2023)

Installation instructions

How to use

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages