Music4All

This repository contains our code for the paper: "Music for All : Representational bias and cross-cultural adaptability in music generation models."

Survey | Model | Paper

We present a study of the datasets and research papers for music generation and quantify the bias and under-representation of genres. We find that only 5.7% of the total hours of existing music datasets come from non-Western genres, which naturally leads to disparate performance of the models across genres. We then investigate the efficacy of Parameter-Efficient Fine-Tuning (PEFT) techniques in mitigating this bias. Our experiments with two popular models -- MusicGen and Mustango, for two underrepresented non-Western music traditions -- Hindustani Classical and Turkish Makam music, highlight the promises as well as the non-triviality of cross-genre adaptation of music through small datasets, implying the need for more equitable baseline music-language models that are designed for cross-cultural transfer learning.

Global Music Generation Analysis

Datasets

The Compmusic dataset contains 120+ hours of Turkish Makam and Hindustani Classical data.

The MTG-Saraga dataset contains 40+ hours of Hindustani Classical annotated data.

For Hindustani Classical, the dataset includes five instrument types—sarangi, harmonium, tabla, violin, and tanpura—along with voice. It comprises 41 ragas across two laya types: Madhya and Vilambit. The dataset features 15 instruments specific to Turkish makam, including the oud, tanbur, ney, davul, clarinet, kös, kudüm, yaylı tanbur, tef, kanun, zurna, bendir, darbuka, classical kemençe, rebab, and çevgen. It encompasses 93 different makams and 63 distinct usuls.

Adapter Positioning

Mustango

To enhance this process, a Bottleneck Residual Adapter with convolution layers is integrated into the up-sampling, middle, and down-sampling blocks of the UNet, positioned just after the cross-attention block. This design facilitates cultural adaptation while preserving computational efficiency. The adapters reduce channel dimensions by a factor of 8, using a kernel size of 1 and GeLU activation after the down-projection layers to introduce non-linearity.

MusicGen

In MusicGen, we enhance the model with an additional 2 million parameters by integrating Linear Bottleneck Residual Adapter after the transformer decoder within the MusicGen architecture after thorough experimentation with other placements.

The total parameter count of both the models is ~2 billion, making the adapter only 0.1% of the total size (2M params). For both models, we used two RTX A6000 GPUs over a period of around 10 hours. The adapter block was fine-tuned, using the AdamW optimizer using MSE (Reconstruction Loss).

Evaluations

Objective Evaluation Metrics for Music Models

The table below presents the objective evaluation metrics for Hindustani Classical Music and Turkish Makam, assessing the quality of generated music based on Fréchet Audio Distance (FAD), Fréchet Distance (FD), Kullback-Leibler Divergence (KLD), and Peak Signal-to-Noise Ratio (PSNR).

Hindustani Classical Music

Model	FAD ↓	FD ↓	KLD ↓	PSNR ↑
MusicGen Baseline	40.05	75.76	6.53	16.23
MusicGen Finetuned	40.04	72.65	6.12	16.18
Mustango Baseline	6.36	45.31	2.73	16.78
Mustango Finetuned	5.18	22.03	1.26	17.70

Turkish Makam

Model	FAD ↓	FD ↓	KLD ↓	PSNR ↑
MusicGen Baseline	39.65	57.29	7.35	14.60
MusicGen Finetuned	39.68	56.71	7.21	14.46
Mustango Baseline	8.65	75.21	6.01	16.60
Mustango Finetuned	2.57	20.56	4.81	16.17

Human Evaluation (ELO Ratings, ↑)

The table below presents the human evaluation scores (ELO Ratings) for Hindustani Classical Music and Turkish Makam, where higher values indicate better performance.

Hindustani Classical Music - All Queries

Model	OA ↑	Inst. ↑	MC ↑	RC ↑	CR ↑
MusicGen Baseline	1525	1520	1540	1552	1546
Mustango Baseline	1449	1466	1409	1470	1518
MusicGen Finetuned	1448	1454	1428	1439	1448
Mustango Finetuned	1577	1559	1623	1538	1487

Turkish Makam - All Queries

Model	OA ↑	Inst. ↑	MC ↑	RC ↑	CR ↑
MusicGen Baseline	1539	1562	1597	1622	1603
Mustango Baseline	1527	1531	1499	1523	1560
MusicGen Finetuned	1597	1529	1570	1570	1541
Mustango Finetuned	1337	1377	1334	1286	1297

Legend:

OA (Overall Accuracy)
Inst. (Instrumentation)
MC (Melodic Consistency)
RC (Rhythmic Consistency)
CR (Creativity)

Citation

Please consider citing the following article if you found our work useful:

@inproceedings{mehta-etal-2025-music,
    title = "Music for All: Representational Bias and Cross-Cultural Adaptability of Music Generation Models",
    author = "Mehta, Atharva  and
      Chauhan, Shivam  and
      Djanibekov, Amirbek  and
      Kulkarni, Atharva  and
      Xia, Gus  and
      Choudhury, Monojit",
    editor = "Chiruzzo, Luis  and
      Ritter, Alan  and
      Wang, Lu",
    booktitle = "Findings of the Association for Computational Linguistics: NAACL 2025",
    month = apr,
    year = "2025",
    address = "Albuquerque, New Mexico",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.findings-naacl.258/",
    doi = "10.18653/v1/2025.findings-naacl.258",
    pages = "4569--4585",
    ISBN = "979-8-89176-195-7",
    abstract = "The advent of Music-Language Models has greatly enhanced the automatic music generation capability of AI systems, but they are also limited in their coverage of the musical genres and cultures of the world. We present a study of the datasets and research papers for music generation and quantify the bias and under-representation of genres. We find that only 5.7{\%} of the total hours of existing music datasets come from non-Western genres, which naturally leads to disparate performance of the models across genres.We then investigate the efficacy of Parameter-Efficient Fine-Tuning (PEFT) techniques in mitigating this bias. Our experiments with two popular models {--} MusicGen and Mustango, for two underrepresented non-Western music traditions {--} Hindustani Classical and Turkish Makam music, highlight the promises as well as the non-triviality of cross-genre adaptation of music through small datasets, implying the need for more equitable baseline music-language models that are designed for cross-cultural transfer learning."
}

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
data_processing		data_processing
eval		eval
img		img
musicgen		musicgen
mustango		mustango
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Music4All

Global Music Generation Analysis

Datasets

Adapter Positioning

Mustango

MusicGen

Evaluations

Objective Evaluation Metrics for Music Models

Human Evaluation (ELO Ratings, ↑)

Hindustani Classical Music - All Queries

Turkish Makam - All Queries

Legend:

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Languages

atharva20038/music4all

Folders and files

Latest commit

History

Repository files navigation

Music4All

Global Music Generation Analysis

Datasets

Adapter Positioning

Mustango

MusicGen

Evaluations

Objective Evaluation Metrics for Music Models

Human Evaluation (ELO Ratings, ↑)

Hindustani Classical Music - All Queries

Turkish Makam - All Queries

Legend:

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Languages

Packages