Skip to content

Latest commit

 

History

History
100 lines (82 loc) · 11 KB

File metadata and controls

100 lines (82 loc) · 11 KB

DeSTA-AQA5M

Name Response Generated From HuggingFace ID Preview
DeSTA-AQA5M Llama3.1-8B-Instruct DeSTA-ntu/DeSTA-AQA5M-FROM-Llama3.1-8B-Instruct 🔍

Load Dataset

from datasets import load_dataset

dataset = load_dataset("DeSTA-ntu/DeSTA-AQA5M-FROM-Llama3.1-8B-Instruct")

# Load from chunked data files
# dataset = load_dataset("DeSTA-ntu/DeSTA-AQA5M-FROM-Llama3.1-8B-Instruct", data_files=["audio.0.jsonl", "audio.1.jsonl", "speech.0.jsonl", "speech.1.jsonl"])
DatasetDict({
    train: Dataset({
        features: ['id', 'dataset', 'seed_description', 'prompt', 'response', 'messages'],
        num_rows: 4963845
    })
})
  • Core fields (used for dataset generation and training):

    • messages: The input messages used for data generation and model training.
    • response: The model-generated response (used as the training target).
  • Auxiliary fields (for display or metadata purposes):

    • id: Audio file ID(relative audio filepath)
    • dataset: The source dataset.
    • seed_description: The textual description constructed from the audio metadata.
    • prompt: The sampled prompt from the instruction pool.

Note: We do not hold the license to redistribute the original audio files. Please download the audio files directly from the original dataset sources.

Dataset Details

Dataset Example seed description Paper Dataset Link(TBA)
IEMOCAP [00:00-00:02] Excuse me? (Emotion: Neutral, Gender: Female, Pitch: Very high, Volume: Low, Speaking speed: Very slow, Duration: 2s) link
DailyTalk [00:00-00:02] I'm figuring out my budget. (Emotion: No emotion, Act: Inform, Gender: Male, Duration: 2s) link
GLOBE [00:00-00:04] the belts needed periodic cleaning and conditioning to keep them in good condition. (Accent: United States English, Age: teens, Gender: male, Duration: 4s) link
VCTK-corpus [00:00-00:07] She can scoop these things into three red bags, and we will go meet her Wednesday at the train station. (Gender: Female, Pitch:low, Accent: newzealand, Age: 23, Emotion: neutral) link
MELD [00:00-00:02] Take it. (Emotion:Neutral, Sentiment:Neutral, Gender: Female, Duration: 2s) link
PromptTTS [00:00-00:05] Deeply engrossed in congenial work (Speaking speed: Slow, Volume: Normal, Pitch: Low, Gender: Female, Emotion: cheerful, Duration: 5s) link
Expresso [00:00-00:01] Karen's in Switzerland? (Style: confused, Gender: Male, Duration: 1s) link
AccentDB [00:00-00:04] The hogs were fed chopped corn and garbage. (Accent: American, Gender: Female, Emotion: Neutral, Duration: 4s) link
VoxCeleb1 [00:00-00:09] and it's all what side of the coin you're looking at... (Gender: Male, Emotion: Neutral, Duration: 9s) link
Anispeech [00:00-00:06] You have no idea how heartless those snuffs are... (Emotion: angry, Speaking speed: Fast, Pitch: normal, Gender: Male, Duration: 6s) link
MSP-IMPROV [00:00-00:05] Yeah, well I'm going to go to class... (Gender: Female, Emotion: Angry, Activation: 3.0/5.0, Valence: 2.2/5.0, Dominance: 3.2/5.0, Naturalness: 4.4/5.0) link
Fair-speech [00:00-00:05] hi there is something i would like you to see (Gender: male, Age: 31 - 45, First Language: English, Socioeconomic Background: Medium, Ethnicity: Black or African American) link
CREMA-D [00:00-00:02] It's eleven o'clock (Emotion: Anger, Gender: Male, Age: 51, Race: Caucasian, Ethnicity: Not Hispanic) link
CAFE [00:00-00:04] Trois cygnes aveugles au bord du lac (Emotion: Anger, Gender: Male, Intensity: Low intensity, Age: 46, English Translation: Three blind swans by the lake) link
EMOVO [00:00-00:05] Vorrei il numero telefonico del Signor Piatti. (Emotion: Disgust, Gender: Female, Age: 28) link
Speech accent archive [00:00-00:22] Please call Stella... (Gender: male, Age: 40, Native Language: afrikaans, Country: south africa) link
EMNS [00:00-00:05] He was a plucked instrument instructor... (Emotion: Happy, Gender: Female, Speaker: 3, Age: 20s) link
KeSpeech [00:00-00:04] 看重成长质量和竞争优势的成长型基金 (Speaker: 1000048, Dialect: Mandarin) link
ESD [00:00-00:02] 我每个月打一次电话。 (Emotion: Angry, Gender: Female) link
LibriSpeech-c [00:00-00:05] (Number of Speakers: 0, Duration: 5s) link
L2Arctic [00:00-00:03] Lord but I'm glad to see you again Phil (Accent: Arabic, Speaker: ABA) link
CommonVoice (EN and CN) [00:00-00:04] The boy swore that... (Gender: male_masculine, Age: thirties, Accent: West Indies and Bermuda, Duration: 4s) link
EmoV-DB [00:00-00:05] And you always want to see it in the superlative degree. (Emotion: Amused, Gender: Female) link
LibriTTS-R [00:00-00:05] About artists and their work mr Quilter... (Gender: male, Speaker: John Rose, Pitch Type: moderate pitch, Noise Type: balanced in clarity, etc.) link
Dusha [00:00-00:06] афина поприкольней было чем джой джой дура (Emotion: angry, Speaker: ..., Speaker_Emotion: angry) link
MSP-PODCAST [00:00-00:05] yeah. so we're going to end this... (Emotion: N, Gender: Unknown, Arousal: 3.0, Valence: 3.6, Dominance: 4.0) link
AliMeeting Multi-speaker Mandarin conversation samples (multiple timestamps and speakers) link
CSZS [00:00-00:03] Juan de Talavera belonged to the so-called escuela toledana. (Gender: male, Language: Spanish-English code-switching) link
NTUML2021 [00:00-00:02] 好那我們就開始上課吧 (Gender: male) link
Speech Command [00:00-00:01] (Command: silence, Duration: 1s) link
Libricoount [00:00-00:05] (Number of Speakers: 0, Duration: 5s) link
Voxlingual [00:00-00:19] بس عم يلزق بالمدخنين... (Language: Arabic, Duration: 19s) link
ASVspoof [00:00-00:02] I'm not worried about the critics. (Gender: Male, Source: Real human) link
BIIC-Podcast [00:00-00:03] 這個連結只有在現場才感受的到 所以大 (Emotion: Happy, Gender: Female, Sentiment: somewhat positive) link
CodecFake [00:00-00:06] There is , according to legend, a boiling pot of gold at one end. (Gender: Female, Accent: england, Source: Synthesis speech) link link
Paraspeechcaps [00:00-00:05] So as a person who doesn't live in a bubble... (Gender: male, emotion: disgusted, etc.) link
VCTK+MUSAN [00:00-00:03] (4 speakers talking) (Noise Level: Noisy, Signal-to-Noise Ratio: 10db) link
Dynamic-SUPERB-Train-noise-reverb [00:00:00 - 00:00:04]Alan_Alda: "I pull the covers up just enough so the next time I look at them, it'll be a little gift to myself." (Gender:Male, Noise Level: Moderate(Signal-to-Noise Ratio: 15db), Reverberation(C50): 60ms, Duration: 4s) link
Audioset [00:00-00:10] (speech, gush) link
AudioCaps [00:00-00:10] (Plastic crinkling... people talk) link
Wavcaps [00:00-00:10] (There is the sound of a truck.) link
Clotho [00:00-00:26] (Someone opening and closing a door...) link
VocalSound [00:00-00:11] (Sneeze) link
ESC50 [00:00-00:05] (Audio category: door_wood_knock, Duration: 5.0) link
FSD50K [00:00-00:18] (Type: ['water', 'gurgling', 'toilet flush']...) link
THMINT-QI [00:00-00:04] 妈亲来解妄语的关系哦 (gender: male, mos_score: 2, Speech_quality: 2/5(Poor)) link
Nsynth [00:00-00:04] (Family: bass, Source: electronic, MIDI Note: 022...) link
OpenSinger [00:00-00:06] 多少凉薄世态可动荡 (Gender: Male, Song: 一如年少模样) link
FMA [00:00-00:30] You can watch the show... (Genre: Rock) link
GTZAN [00:00-00:30] (Genre: reggae, Duration: 30s) link
Mridangam [00:00-00:01] (Stroke: cha, Tonic: b, Duration: 1s) link