Recognising speech samples with tslearn #533

imwihfm · 2025-05-26T12:09:12Z

imwihfm
May 26, 2025

Hi

I am very new to the field of DTW and still very much learning. There is a question I have about using the tslearn package, for actually recognizing spoken words, I couldn't find in the docs how this can be done with actual recordings. And I also would very much like to understand how the Dtw handles this with recorded audio, how could a comparison of a set of discrete values (e.g. amplitude values in case of audio files) be used to understand the content of the recording, or is there some frequency analysis (FFT, Inverse FFT) involved?
I specifically would be grateful of some practical code-example which shows how tslearn can be used to apply dtw to two hypothetical audio files containing the same speech pattern.
Pointing me to good resources and readings is also very much appreciated.

charavelg · 2025-06-06T12:47:46Z

charavelg
Jun 6, 2025
Collaborator

Hi,

thanks for your interest in tslearn!

From my understanding of your use case, you probably need some data preprocessing before considering using tslearn. For audio and music librosa might fit your needs.

Once your audio is transformed into proper time series, they can be used in tslearn and that's when the fun begins!
Dynamic Time Warping (DTW) provides a measure of the similarity between time series which may vary in speed. See this (among many other resources) for an intro on the subject. Therefore, and assuming you have a dataset used as a reference (in the case of work recognition, think of it as some kind of dictionary), you can compare new time series to this model to determine the "closest" entry in the reference.

We will be glad to here from your experimentation on the field!

Regards

0 replies

imwihfm · 2025-06-06T14:08:13Z

imwihfm
Jun 6, 2025
Author

Hi and thank you for your reply. I just recently also started looking into Librosa, and that makes things clearer for me, how DTW could be used for real audio work (although there are some specific audio-aspects of working with DTW which still puzzle me, for instance, how a mere comparison of time-series of audio amplitude values could help in deducing the actual linguistic/musical "content" of audio files, but that is a subject I will be studying and learning happily in the coming time! :-)

My interest in DTW is more of an artistic one, rather than a scientific one. I am a composer of electroacoustic music, and I have been working on adapting the DTW algorithm for creating transitional musical structures and transformations of sounds, basically by feeding DTW with musical features and then creating interpolations between the mappings of the alignment path I get out of DTW. Below are my most recent electroacoustic composition and some more information about my approach for using DTW in my musical works:

https://www.soundingfuture.com/en/article/creating-tonal-transitions-using-dynamic-time-warping?auHash=eLu_cRQSMdVNevltK4n7MmtByjD_lyZ_1D2LWa5vDRM

https://teymuri.com/wtss.html

and here is a Stereo-mix of the 8-channel original composition:
https://alonetone.com/amte/tracks/warping-time-in-search-of-similarity

0 replies

charavelg · 2025-06-06T14:56:48Z

charavelg
Jun 6, 2025
Collaborator

Bravo for the amazing work

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Recognising speech samples with tslearn #533

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Recognising speech samples with tslearn #533

Uh oh!

Uh oh!

imwihfm May 26, 2025

Replies: 3 comments

Uh oh!

charavelg Jun 6, 2025 Collaborator

Uh oh!

Uh oh!

imwihfm Jun 6, 2025 Author

Uh oh!

charavelg Jun 6, 2025 Collaborator

imwihfm
May 26, 2025

charavelg
Jun 6, 2025
Collaborator

imwihfm
Jun 6, 2025
Author

charavelg
Jun 6, 2025
Collaborator