Replies: 3 comments
-
Hi, thanks for your interest in tslearn! From my understanding of your use case, you probably need some data preprocessing before considering using tslearn. For audio and music librosa might fit your needs. Once your audio is transformed into proper time series, they can be used in tslearn and that's when the fun begins! We will be glad to here from your experimentation on the field! Regards |
Beta Was this translation helpful? Give feedback.
-
Hi and thank you for your reply. I just recently also started looking into Librosa, and that makes things clearer for me, how DTW could be used for real audio work (although there are some specific audio-aspects of working with DTW which still puzzle me, for instance, how a mere comparison of time-series of audio amplitude values could help in deducing the actual linguistic/musical "content" of audio files, but that is a subject I will be studying and learning happily in the coming time! :-) My interest in DTW is more of an artistic one, rather than a scientific one. I am a composer of electroacoustic music, and I have been working on adapting the DTW algorithm for creating transitional musical structures and transformations of sounds, basically by feeding DTW with musical features and then creating interpolations between the mappings of the alignment path I get out of DTW. Below are my most recent electroacoustic composition and some more information about my approach for using DTW in my musical works: and here is a Stereo-mix of the 8-channel original composition: |
Beta Was this translation helpful? Give feedback.
-
Bravo for the amazing work |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi
I am very new to the field of DTW and still very much learning. There is a question I have about using the tslearn package, for actually recognizing spoken words, I couldn't find in the docs how this can be done with actual recordings. And I also would very much like to understand how the Dtw handles this with recorded audio, how could a comparison of a set of discrete values (e.g. amplitude values in case of audio files) be used to understand the content of the recording, or is there some frequency analysis (FFT, Inverse FFT) involved?
I specifically would be grateful of some practical code-example which shows how tslearn can be used to apply dtw to two hypothetical audio files containing the same speech pattern.
Pointing me to good resources and readings is also very much appreciated.
Beta Was this translation helpful? Give feedback.
All reactions