DeSTA2.5-Audio/docs/dataset.md at main · kehanlu/DeSTA2.5-Audio

DeSTA-AQA5M

Name	Response Generated From	HuggingFace ID	Preview
DeSTA-AQA5M	Llama3.1-8B-Instruct	DeSTA-ntu/DeSTA-AQA5M-FROM-Llama3.1-8B-Instruct	🔍

Load Dataset

from datasets import load_dataset

dataset = load_dataset("DeSTA-ntu/DeSTA-AQA5M-FROM-Llama3.1-8B-Instruct")

# Load from chunked data files
# dataset = load_dataset("DeSTA-ntu/DeSTA-AQA5M-FROM-Llama3.1-8B-Instruct", data_files=["audio.0.jsonl", "audio.1.jsonl", "speech.0.jsonl", "speech.1.jsonl"])

DatasetDict({
    train: Dataset({
        features: ['id', 'dataset', 'seed_description', 'prompt', 'response', 'messages'],
        num_rows: 4963845
    })
})

Core fields (used for dataset generation and training):
- messages: The input messages used for data generation and model training.
- response: The model-generated response (used as the training target).
Auxiliary fields (for display or metadata purposes):
- id: Audio file ID(relative audio filepath)
- dataset: The source dataset.
- seed_description: The textual description constructed from the audio metadata.
- prompt: The sampled prompt from the instruction pool.

Note: We do not hold the license to redistribute the original audio files. Please download the audio files directly from the original dataset sources.

Dataset Details

Dataset	Example seed description	Paper
IEMOCAP	[00:00-00:02] Excuse me? (Emotion: Neutral, Gender: Female, Pitch: Very high, Volume: Low, Speaking speed: Very slow, Duration: 2s)	link
DailyTalk	[00:00-00:02] I'm figuring out my budget. (Emotion: No emotion, Act: Inform, Gender: Male, Duration: 2s)	link
GLOBE	[00:00-00:04] the belts needed periodic cleaning and conditioning to keep them in good condition. (Accent: United States English, Age: teens, Gender: male, Duration: 4s)	link
VCTK-corpus	[00:00-00:07] She can scoop these things into three red bags, and we will go meet her Wednesday at the train station. (Gender: Female, Pitch:low, Accent: newzealand, Age: 23, Emotion: neutral)	link
MELD	[00:00-00:02] Take it. (Emotion:Neutral, Sentiment:Neutral, Gender: Female, Duration: 2s)	link
PromptTTS	[00:00-00:05] Deeply engrossed in congenial work (Speaking speed: Slow, Volume: Normal, Pitch: Low, Gender: Female, Emotion: cheerful, Duration: 5s)	link
Expresso	[00:00-00:01] Karen's in Switzerland? (Style: confused, Gender: Male, Duration: 1s)	link
AccentDB	[00:00-00:04] The hogs were fed chopped corn and garbage. (Accent: American, Gender: Female, Emotion: Neutral, Duration: 4s)	link
VoxCeleb1	[00:00-00:09] and it's all what side of the coin you're looking at... (Gender: Male, Emotion: Neutral, Duration: 9s)	link
Anispeech	[00:00-00:06] You have no idea how heartless those snuffs are... (Emotion: angry, Speaking speed: Fast, Pitch: normal, Gender: Male, Duration: 6s)	link
MSP-IMPROV	[00:00-00:05] Yeah, well I'm going to go to class... (Gender: Female, Emotion: Angry, Activation: 3.0/5.0, Valence: 2.2/5.0, Dominance: 3.2/5.0, Naturalness: 4.4/5.0)	link
Fair-speech	[00:00-00:05] hi there is something i would like you to see (Gender: male, Age: 31 - 45, First Language: English, Socioeconomic Background: Medium, Ethnicity: Black or African American)	link
CREMA-D	[00:00-00:02] It's eleven o'clock (Emotion: Anger, Gender: Male, Age: 51, Race: Caucasian, Ethnicity: Not Hispanic)	link
CAFE	[00:00-00:04] Trois cygnes aveugles au bord du lac (Emotion: Anger, Gender: Male, Intensity: Low intensity, Age: 46, English Translation: Three blind swans by the lake)	link
EMOVO	[00:00-00:05] Vorrei il numero telefonico del Signor Piatti. (Emotion: Disgust, Gender: Female, Age: 28)	link
Speech accent archive	[00:00-00:22] Please call Stella... (Gender: male, Age: 40, Native Language: afrikaans, Country: south africa)	link
EMNS	[00:00-00:05] He was a plucked instrument instructor... (Emotion: Happy, Gender: Female, Speaker: 3, Age: 20s)	link
KeSpeech	[00:00-00:04] 看重成长质量和竞争优势的成长型基金 (Speaker: 1000048, Dialect: Mandarin)	link
ESD	[00:00-00:02] 我每个月打一次电话。 (Emotion: Angry, Gender: Female)	link
LibriSpeech-c	[00:00-00:05] (Number of Speakers: 0, Duration: 5s)	link
L2Arctic	[00:00-00:03] Lord but I'm glad to see you again Phil (Accent: Arabic, Speaker: ABA)	link
CommonVoice (EN and CN)	[00:00-00:04] The boy swore that... (Gender: male_masculine, Age: thirties, Accent: West Indies and Bermuda, Duration: 4s)	link
EmoV-DB	[00:00-00:05] And you always want to see it in the superlative degree. (Emotion: Amused, Gender: Female)	link
LibriTTS-R	[00:00-00:05] About artists and their work mr Quilter... (Gender: male, Speaker: John Rose, Pitch Type: moderate pitch, Noise Type: balanced in clarity, etc.)	link
Dusha	[00:00-00:06] афина поприкольней было чем джой джой дура (Emotion: angry, Speaker: ..., Speaker_Emotion: angry)	link
MSP-PODCAST	[00:00-00:05] yeah. so we're going to end this... (Emotion: N, Gender: Unknown, Arousal: 3.0, Valence: 3.6, Dominance: 4.0)	link
AliMeeting	Multi-speaker Mandarin conversation samples (multiple timestamps and speakers)	link
CSZS	[00:00-00:03] Juan de Talavera belonged to the so-called escuela toledana. (Gender: male, Language: Spanish-English code-switching)	link
NTUML2021	[00:00-00:02] 好那我們就開始上課吧 (Gender: male)	link
Speech Command	[00:00-00:01] (Command: silence, Duration: 1s)	link
Libricoount	[00:00-00:05] (Number of Speakers: 0, Duration: 5s)	link
Voxlingual	[00:00-00:19] بس عم يلزق بالمدخنين... (Language: Arabic, Duration: 19s)	link
ASVspoof	[00:00-00:02] I'm not worried about the critics. (Gender: Male, Source: Real human)	link
BIIC-Podcast	[00:00-00:03] 這個連結只有在現場才感受的到所以大 (Emotion: Happy, Gender: Female, Sentiment: somewhat positive)	link
CodecFake	[00:00-00:06] There is , according to legend, a boiling pot of gold at one end. (Gender: Female, Accent: england, Source: Synthesis speech)	link link
Paraspeechcaps	[00:00-00:05] So as a person who doesn't live in a bubble... (Gender: male, emotion: disgusted, etc.)	link
VCTK+MUSAN	[00:00-00:03] (4 speakers talking) (Noise Level: Noisy, Signal-to-Noise Ratio: 10db)	link
Dynamic-SUPERB-Train-noise-reverb	[00:00:00 - 00:00:04]Alan_Alda: "I pull the covers up just enough so the next time I look at them, it'll be a little gift to myself." (Gender:Male, Noise Level: Moderate(Signal-to-Noise Ratio: 15db), Reverberation(C50): 60ms, Duration: 4s)	link
Audioset	[00:00-00:10] (speech, gush)	link
AudioCaps	[00:00-00:10] (Plastic crinkling... people talk)	link
Wavcaps	[00:00-00:10] (There is the sound of a truck.)	link
Clotho	[00:00-00:26] (Someone opening and closing a door...)	link
VocalSound	[00:00-00:11] (Sneeze)	link
ESC50	[00:00-00:05] (Audio category: door_wood_knock, Duration: 5.0)	link
FSD50K	[00:00-00:18] (Type: ['water', 'gurgling', 'toilet flush']...)	link
THMINT-QI	[00:00-00:04] 妈亲来解妄语的关系哦 (gender: male, mos_score: 2, Speech_quality: 2/5(Poor))	link
Nsynth	[00:00-00:04] (Family: bass, Source: electronic, MIDI Note: 022...)	link
OpenSinger	[00:00-00:06] 多少凉薄世态可动荡 (Gender: Male, Song: 一如年少模样)	link
FMA	[00:00-00:30] You can watch the show... (Genre: Rock)	link
GTZAN	[00:00-00:30] (Genre: reggae, Duration: 30s)	link
Mridangam	[00:00-00:01] (Stroke: cha, Tonic: b, Duration: 1s)	link

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DeSTA-AQA5M

Load Dataset

Dataset Details

FilesExpand file tree

dataset.md

Latest commit

History

dataset.md

File metadata and controls

DeSTA-AQA5M

Load Dataset

Dataset Details