Skip to content

An end-to-end pipeline for predicting a speaker’s age from short audio samples. This project integrates advanced audio feature extraction (time-frequency statistics, Mel-spectrogram analysis) with machine learning regression (CatBoost, Random Forest). It demonstrates effective data preprocessing, outlier detection, and hyperparameter tuning

Notifications You must be signed in to change notification settings

AndreaLolli2912/speech-based-age-estimation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

72 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DSL-Age-Estimation

A Machine Learning project for the Data Science Lab: process and methods class, Age Estimation using both tabular and raw audio features. Developed in collaboration with my colleague Daniele Famà, this repository demonstrates the end-to-end process of data loading, feature extraction, preprocessing, and model training with advanced AI regression techniques.


Highlights

  • AI Techniques: Utilizes CatBoost, Random Forest, and other ML algorithms for robust regression.
  • Audio Feature Extraction: Gathers spectral, Mel-spectrogram, and time-domain statistics (skew, kurtosis, etc.) to capture essential speech cues.
  • Tabular Feature Engineering: Cleans and preprocesses demographic and acoustic-linguistic data (outlier detection, scaling, encoding).
  • Model Selection & Tuning: Employs dimensionality reduction (e.g., LDA) and hyperparameter search for optimal performance.
  • Result: Achieved competitive RMSE scores on the public leaderboard, showing effectiveness of the combined approach.

For a detailed description of the methods, refer to the accompanying report.


Installation

  1. Clone this repository:
    git clone https://github.com/AndreaLolli2912/DSL-Age-Estimation.git
    cd DSL-Age-Estimation
  2. Install dependencies:
    pip install -r requirements.txt

Folder Structure

DSL-Age-Estimation
├── catboost_info
├── data
│   ├── audios_development
│   ├── audio_evaluation
│   ├── development.csv
│   └── evaluation.csv
├── figs
├── submissions
├── utils
│   └── audio_extraction.py
├── audio_features_preprocessing.ipynb
├── main.ipynb
├── report_figures.ipynb
├── Fama_Lolli_Age_Regression.pdf
├── README.md
└── requirements.txt
  • data: Place all provided .csv files and audio folders here.
  • utils/audio_extraction.py: Extracts raw audio features (adjust lowcut, highcut, top_db as needed).
  • audio_features_preprocessing.ipynb: Converts raw audio features into tabular format (statistical summaries).
  • main.ipynb: Runs the entire pipeline (EDA, model training, evaluation).

Usage

  1. Extract Raw Audio Features

    python main/utils/audio_extraction.py

    Adjust cleaning parameters (e.g., lowcut, highcut, top_db) as desired.

  2. Preprocess Features
    Open and run audio_features_preprocessing.ipynb to produce summarized data in .h5 or .csv formats.

  3. Run the Model Pipeline
    Use main.ipynb to orchestrate data loading, EDA, and model training with CatBoost/Random Forest.


Credits

Feel free to open issues or submit pull requests for improvements.

About

An end-to-end pipeline for predicting a speaker’s age from short audio samples. This project integrates advanced audio feature extraction (time-frequency statistics, Mel-spectrogram analysis) with machine learning regression (CatBoost, Random Forest). It demonstrates effective data preprocessing, outlier detection, and hyperparameter tuning

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published