This repository provides an advanced-level implementation of speech signal processing, focusing on Fourier Transform, Spectrograms, and MFCCs.
📌 Features
Fourier Transform & Spectrograms: Convert speech signals from time-domain to frequency-domain.
Mel-Frequency Cepstral Coefficients (MFCCs): Extract meaningful features for speech recognition.
What is the Fourier Transform?
Sound waves are typically represented in the time domain (waveforms), but analyzing their frequency components is crucial. The Fourier Transform (FT) converts a time-domain signal into its frequency components.
The Short-Time Fourier Transform (STFT) is commonly used in speech processing to create spectrograms, which display how frequencies change over time.
Spectrograms: A spectrogram is a visual representation of sound frequencies of a signal as it varies with time. Unlike waveforms that show amplitude over time, spectrograms reveal the frequency content.
What are Mel-Frequency Cepstral Coefficients MFCCs?
MFCCs are widely used in speech recognition as they mimic how humans perceive sound. The human ear is more sensitive to certain frequencies, so MFCCs use a Mel scale to focus on perceptually important features. They represent the speech signal's spectral properties in a way that mimics human auditory perception.