| Name | Response Generated From | HuggingFace ID | Preview |
|---|---|---|---|
| DeSTA-AQA5M | Llama3.1-8B-Instruct | DeSTA-ntu/DeSTA-AQA5M-FROM-Llama3.1-8B-Instruct | 🔍 |
from datasets import load_dataset
dataset = load_dataset("DeSTA-ntu/DeSTA-AQA5M-FROM-Llama3.1-8B-Instruct")
# Load from chunked data files
# dataset = load_dataset("DeSTA-ntu/DeSTA-AQA5M-FROM-Llama3.1-8B-Instruct", data_files=["audio.0.jsonl", "audio.1.jsonl", "speech.0.jsonl", "speech.1.jsonl"])DatasetDict({
train: Dataset({
features: ['id', 'dataset', 'seed_description', 'prompt', 'response', 'messages'],
num_rows: 4963845
})
})
-
Core fields (used for dataset generation and training):
messages: The input messages used for data generation and model training.response: The model-generated response (used as the training target).
-
Auxiliary fields (for display or metadata purposes):
id: Audio file ID(relative audio filepath)dataset: The source dataset.seed_description: The textual description constructed from the audio metadata.prompt: The sampled prompt from the instruction pool.
Note: We do not hold the license to redistribute the original audio files. Please download the audio files directly from the original dataset sources.
| Dataset | Example seed description | Paper | Dataset Link(TBA) |
|---|---|---|---|
| IEMOCAP | [00:00-00:02] Excuse me? (Emotion: Neutral, Gender: Female, Pitch: Very high, Volume: Low, Speaking speed: Very slow, Duration: 2s) | link | |
| DailyTalk | [00:00-00:02] I'm figuring out my budget. (Emotion: No emotion, Act: Inform, Gender: Male, Duration: 2s) | link | |
| GLOBE | [00:00-00:04] the belts needed periodic cleaning and conditioning to keep them in good condition. (Accent: United States English, Age: teens, Gender: male, Duration: 4s) | link | |
| VCTK-corpus | [00:00-00:07] She can scoop these things into three red bags, and we will go meet her Wednesday at the train station. (Gender: Female, Pitch:low, Accent: newzealand, Age: 23, Emotion: neutral) | link | |
| MELD | [00:00-00:02] Take it. (Emotion:Neutral, Sentiment:Neutral, Gender: Female, Duration: 2s) | link | |
| PromptTTS | [00:00-00:05] Deeply engrossed in congenial work (Speaking speed: Slow, Volume: Normal, Pitch: Low, Gender: Female, Emotion: cheerful, Duration: 5s) | link | |
| Expresso | [00:00-00:01] Karen's in Switzerland? (Style: confused, Gender: Male, Duration: 1s) | link | |
| AccentDB | [00:00-00:04] The hogs were fed chopped corn and garbage. (Accent: American, Gender: Female, Emotion: Neutral, Duration: 4s) | link | |
| VoxCeleb1 | [00:00-00:09] and it's all what side of the coin you're looking at... (Gender: Male, Emotion: Neutral, Duration: 9s) | link | |
| Anispeech | [00:00-00:06] You have no idea how heartless those snuffs are... (Emotion: angry, Speaking speed: Fast, Pitch: normal, Gender: Male, Duration: 6s) | link | |
| MSP-IMPROV | [00:00-00:05] Yeah, well I'm going to go to class... (Gender: Female, Emotion: Angry, Activation: 3.0/5.0, Valence: 2.2/5.0, Dominance: 3.2/5.0, Naturalness: 4.4/5.0) | link | |
| Fair-speech | [00:00-00:05] hi there is something i would like you to see (Gender: male, Age: 31 - 45, First Language: English, Socioeconomic Background: Medium, Ethnicity: Black or African American) | link | |
| CREMA-D | [00:00-00:02] It's eleven o'clock (Emotion: Anger, Gender: Male, Age: 51, Race: Caucasian, Ethnicity: Not Hispanic) | link | |
| CAFE | [00:00-00:04] Trois cygnes aveugles au bord du lac (Emotion: Anger, Gender: Male, Intensity: Low intensity, Age: 46, English Translation: Three blind swans by the lake) | link | |
| EMOVO | [00:00-00:05] Vorrei il numero telefonico del Signor Piatti. (Emotion: Disgust, Gender: Female, Age: 28) | link | |
| Speech accent archive | [00:00-00:22] Please call Stella... (Gender: male, Age: 40, Native Language: afrikaans, Country: south africa) | link | |
| EMNS | [00:00-00:05] He was a plucked instrument instructor... (Emotion: Happy, Gender: Female, Speaker: 3, Age: 20s) | link | |
| KeSpeech | [00:00-00:04] 看重成长质量和竞争优势的成长型基金 (Speaker: 1000048, Dialect: Mandarin) | link | |
| ESD | [00:00-00:02] 我每个月打一次电话。 (Emotion: Angry, Gender: Female) | link | |
| LibriSpeech-c | [00:00-00:05] (Number of Speakers: 0, Duration: 5s) | link | |
| L2Arctic | [00:00-00:03] Lord but I'm glad to see you again Phil (Accent: Arabic, Speaker: ABA) | link | |
| CommonVoice (EN and CN) | [00:00-00:04] The boy swore that... (Gender: male_masculine, Age: thirties, Accent: West Indies and Bermuda, Duration: 4s) | link | |
| EmoV-DB | [00:00-00:05] And you always want to see it in the superlative degree. (Emotion: Amused, Gender: Female) | link | |
| LibriTTS-R | [00:00-00:05] About artists and their work mr Quilter... (Gender: male, Speaker: John Rose, Pitch Type: moderate pitch, Noise Type: balanced in clarity, etc.) | link | |
| Dusha | [00:00-00:06] афина поприкольней было чем джой джой дура (Emotion: angry, Speaker: ..., Speaker_Emotion: angry) | link | |
| MSP-PODCAST | [00:00-00:05] yeah. so we're going to end this... (Emotion: N, Gender: Unknown, Arousal: 3.0, Valence: 3.6, Dominance: 4.0) | link | |
| AliMeeting | Multi-speaker Mandarin conversation samples (multiple timestamps and speakers) | link | |
| CSZS | [00:00-00:03] Juan de Talavera belonged to the so-called escuela toledana. (Gender: male, Language: Spanish-English code-switching) | link | |
| NTUML2021 | [00:00-00:02] 好那我們就開始上課吧 (Gender: male) | link | |
| Speech Command | [00:00-00:01] (Command: silence, Duration: 1s) | link | |
| Libricoount | [00:00-00:05] (Number of Speakers: 0, Duration: 5s) | link | |
| Voxlingual | [00:00-00:19] بس عم يلزق بالمدخنين... (Language: Arabic, Duration: 19s) | link | |
| ASVspoof | [00:00-00:02] I'm not worried about the critics. (Gender: Male, Source: Real human) | link | |
| BIIC-Podcast | [00:00-00:03] 這個連結只有在現場才感受的到 所以大 (Emotion: Happy, Gender: Female, Sentiment: somewhat positive) | link | |
| CodecFake | [00:00-00:06] There is , according to legend, a boiling pot of gold at one end. (Gender: Female, Accent: england, Source: Synthesis speech) | link link | |
| Paraspeechcaps | [00:00-00:05] So as a person who doesn't live in a bubble... (Gender: male, emotion: disgusted, etc.) | link | |
| VCTK+MUSAN | [00:00-00:03] (4 speakers talking) (Noise Level: Noisy, Signal-to-Noise Ratio: 10db) | link | |
| Dynamic-SUPERB-Train-noise-reverb | [00:00:00 - 00:00:04]Alan_Alda: "I pull the covers up just enough so the next time I look at them, it'll be a little gift to myself." (Gender:Male, Noise Level: Moderate(Signal-to-Noise Ratio: 15db), Reverberation(C50): 60ms, Duration: 4s) | link | |
| Audioset | [00:00-00:10] (speech, gush) | link | |
| AudioCaps | [00:00-00:10] (Plastic crinkling... people talk) | link | |
| Wavcaps | [00:00-00:10] (There is the sound of a truck.) | link | |
| Clotho | [00:00-00:26] (Someone opening and closing a door...) | link | |
| VocalSound | [00:00-00:11] (Sneeze) | link | |
| ESC50 | [00:00-00:05] (Audio category: door_wood_knock, Duration: 5.0) | link | |
| FSD50K | [00:00-00:18] (Type: ['water', 'gurgling', 'toilet flush']...) | link | |
| THMINT-QI | [00:00-00:04] 妈亲来解妄语的关系哦 (gender: male, mos_score: 2, Speech_quality: 2/5(Poor)) | link | |
| Nsynth | [00:00-00:04] (Family: bass, Source: electronic, MIDI Note: 022...) | link | |
| OpenSinger | [00:00-00:06] 多少凉薄世态可动荡 (Gender: Male, Song: 一如年少模样) | link | |
| FMA | [00:00-00:30] You can watch the show... (Genre: Rock) | link | |
| GTZAN | [00:00-00:30] (Genre: reggae, Duration: 30s) | link | |
| Mridangam | [00:00-00:01] (Stroke: cha, Tonic: b, Duration: 1s) | link |