These files consitute a set of files that can be used to run the analyses for cross-linguistic analysis of large speech model embeddings.
wav2vec 2.0 HuBERT (TODO: WavLM)
Hindi Commonvoice (13) AI for Bharat () Vaani ()
English Librispeech Wall Street Journal Corpus
Korean Seoul Corpus
-
data cleaning
-
feature extraction
-
model fine-tuning
-
evaluation
- ABX
- 2AFC
- ABX