Skip to content

Commit d19b277

Browse files
author
Greg Cohen
committed
Added a host of sign language datasets
1 parent f5c3e62 commit d19b277

File tree

9 files changed

+1287
-0
lines changed

9 files changed

+1287
-0
lines changed

datasets/CSL-Daily-Event.md

Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
---
2+
{
3+
"name": "CSL-Daily-Event",
4+
"aliases": [],
5+
"year": 2024,
6+
"modalities": [
7+
"Vision"
8+
],
9+
"sensors": [
10+
"DVS-Voltmeter"
11+
],
12+
"other_sensors": [],
13+
"category": "Human-centric Recordings",
14+
"tags": [
15+
"Sign Languge",
16+
"Hand Pose Detection"
17+
],
18+
"description": "Sign Language Translation Dataset",
19+
"dataset_properties": {
20+
"available_online": false,
21+
"has_real_data": true,
22+
"has_simulated_data": true,
23+
"has_ground_truth": false,
24+
"has_frames": true,
25+
"has_biases": false,
26+
"distribution_methods": [],
27+
"file_formats": [],
28+
"availability_comment": "",
29+
"dataset_links": [],
30+
"size_gb": 0.0,
31+
"size_type": "Compressed"
32+
},
33+
"paper": {
34+
"title": "Event Stream based Sign Language Translation: A High-Definition Benchmark Dataset and A New Algorithm",
35+
"doi": "10.48550/arXiv.2408.10488",
36+
"authors": [
37+
"Xiao Wang",
38+
"Yao Rong",
39+
"Fuling Wang",
40+
"Jianing Li",
41+
"Lin Zhu",
42+
"Bo Jiang",
43+
"Yaowei Wang"
44+
],
45+
"abstract": "Sign Language Translation (SLT) is a core task in the field of AI-assisted disability. Unlike traditional SLT based on visible light videos, which is easily affected by factors such as lighting, rapid hand movements, and privacy breaches, this paper proposes the use of high-definition Event streams for SLT, effectively mitigating the aforementioned issues. This is primarily because Event streams have a high dynamic range and dense temporal signals, which can withstand low illumination and motion blur well. Additionally, due to their sparsity in space, they effectively protect the privacy of the target person. More specifically, we propose a new high-resolution Event stream sign language dataset, termed Event-CSL, which effectively fills the data gap in this area of research. It contains 14,827 videos, 14,821 glosses, and 2,544 Chinese words in the text vocabulary. These samples are collected in a variety of indoor and outdoor scenes, encompassing multiple angles, light intensities, and camera movements. We have benchmarked existing mainstream SLT works to enable fair comparison for future efforts. Based on this dataset and several other large-scale datasets, we propose a novel baseline method that fully leverages the Mamba model's ability to integrate temporal information of CNN features, resulting in improved sign language translation outcomes. Both the benchmark dataset and source code will be released on https://github.com/Event-AHU/OpenESL",
46+
"open_access": false
47+
},
48+
"citation_counts": [
49+
{
50+
"source": "scholar",
51+
"count": 2,
52+
"updated": "2025-09-14T22:54:35.384582"
53+
}
54+
],
55+
"links": [
56+
{
57+
"type": "preprint",
58+
"url": "https://www.arxiv.org/abs/2408.10488"
59+
}
60+
],
61+
"full_name": "",
62+
"additional_metadata": {
63+
"language": "Chinese",
64+
"source_dataset": "CSL-Daily-Event"
65+
},
66+
"bibtex": {
67+
"copyright": "arXiv.org perpetual, non-exclusive license",
68+
"year": 2024,
69+
"publisher": "arXiv",
70+
"title": "Event Stream based Sign Language Translation: A High-Definition Benchmark Dataset and A New Algorithm",
71+
"keywords": "Computer Vision and Pattern Recognition (cs.CV), Artificial Intelligence (cs.AI), Computation and Language (cs.CL), Neural and Evolutionary Computing (cs.NE), FOS: Computer and information sciences, FOS: Computer and information sciences",
72+
"author": "Wang, Xiao and Rong, Yao and Wang, Fuling and Li, Jianing and Zhu, Lin and Jiang, Bo and Wang, Yaowei",
73+
"url": "https://arxiv.org/abs/2408.10488",
74+
"doi": "10.48550/ARXIV.2408.10488",
75+
"type": "misc",
76+
"key": "https://doi.org/10.48550/arxiv.2408.10488"
77+
}
78+
}
79+
---
80+
81+
# Dataset Description

datasets/EHPT-XC.md

Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
---
2+
{
3+
"name": "EHPT-XC",
4+
"aliases": [],
5+
"year": 2024.0,
6+
"modalities": [
7+
"Vision"
8+
],
9+
"sensors": [
10+
"Prophesee Gen4"
11+
],
12+
"other_sensors": [
13+
"BFS-U3-16S2C-CS"
14+
],
15+
"category": "Human-centric Recordings",
16+
"tags": [
17+
"Pose Estimation",
18+
"Low Light",
19+
"Beamsplitters"
20+
],
21+
"description": "Human Pose Estimation in Extreme Conditions",
22+
"dataset_properties": {
23+
"available_online": false,
24+
"has_real_data": true,
25+
"has_simulated_data": false,
26+
"has_ground_truth": true,
27+
"has_frames": true,
28+
"has_biases": false,
29+
"distribution_methods": [],
30+
"file_formats": [],
31+
"availability_comment": "Requires a user agreement form to be downloaded, signed, and returned. Handwritten signatures are required.",
32+
"dataset_links": [],
33+
"size_gb": 0.0,
34+
"size_type": "Compressed"
35+
},
36+
"paper": {
37+
"title": "A benchmark dataset for event-guided human pose estimation and tracking in extreme conditions",
38+
"doi": "",
39+
"authors": [
40+
"Hoonhee Cho",
41+
"Taewoo Kim",
42+
"Yuhwan Jeong",
43+
"Kuk-Jin Yoon"
44+
],
45+
"abstract": "Multi-person pose estimation and tracking have been actively researched by the computer vision community due to their practical applicability. However, existing human pose estimation and tracking datasets have only been successful in typical scenarios, such as those without motion blur or with well-lit conditions. These RGB-based datasets are limited to learning under extreme motion blur situations or poor lighting conditions, making them inherently vulnerable to such scenarios.As a promising solution, bio-inspired event cameras exhibit robustness in extreme scenarios due to their high dynamic range and micro-second level temporal resolution. Therefore, in this paper, we introduce a new hybrid dataset encompassing both RGB and event data for human pose estimation and tracking in two extreme scenarios: low-light and motion blur environments. The proposed Event-guided Human Pose Estimation and Tracking in eXtreme Conditions (EHPT-XC) dataset covers cases of motion blur caused by dynamic objects and low-light conditions individually as well as both simultaneously. With EHPT-XC, we aim to inspire researchers to tackle pose estimation and tracking in extreme conditions by leveraging the advantageous of the event camera. Project pages are available at https://github.com/Chohoonhee/EHPT-XC.",
46+
"open_access": false
47+
},
48+
"citation_counts": [],
49+
"links": [
50+
{
51+
"type": "paper",
52+
"url": "https://proceedings.neurips.cc/paper_files/paper/2024/hash/f304e427cfe6bb762fe1bf18516c8a87-Abstract-Datasets_and_Benchmarks_Track.html"
53+
},
54+
{
55+
"type": "github_page",
56+
"url": "https://github.com/Chohoonhee/EHPT-XC"
57+
}
58+
],
59+
"full_name": "Event-guided Human Pose Estimation and Tracking in eXtreme Conditions (EHPTXC)",
60+
"additional_metadata": {
61+
"num_males": "61",
62+
"num_females": "21",
63+
"num_subjects": "82"
64+
},
65+
"bibtex": {
66+
"year": 2024,
67+
"booktitle": "The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track",
68+
"author": "Cho, Hoonhee and Kim, Taewoo and Jeong, Yuhwan and Yoon, Kuk-Jin",
69+
"title": "A Benchmark Dataset for Event-Guided Human Pose Estimation and Tracking in Extreme Conditions",
70+
"type": "inproceedings",
71+
"key": "cho2024benchmark"
72+
}
73+
}
74+
---
75+
76+
# Dataset Description

datasets/EV-ASL.md

Lines changed: 165 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,165 @@
1+
---
2+
{
3+
"name": "EV-ASL",
4+
"aliases": [],
5+
"year": 2021,
6+
"modalities": [
7+
"Vision"
8+
],
9+
"sensors": [
10+
"DVS128"
11+
],
12+
"other_sensors": [],
13+
"category": "Human-centric Recordings",
14+
"tags": [
15+
"Sign Language",
16+
"Hand Pose Detection"
17+
],
18+
"description": "American Sign Language Dataset",
19+
"dataset_properties": {
20+
"available_online": false,
21+
"has_real_data": true,
22+
"has_simulated_data": false,
23+
"has_ground_truth": false,
24+
"has_frames": true,
25+
"has_biases": false,
26+
"distribution_methods": [
27+
"Baidu"
28+
],
29+
"file_formats": [],
30+
"availability_comment": "Download link has a single zip file",
31+
"dataset_links": [
32+
{
33+
"name": "Baidu",
34+
"url": "https://pan.baidu.com/s/1xPYenSSL8w_LcX8pe5i_0g",
35+
"format": "Binary",
36+
"available": true
37+
}
38+
],
39+
"size_gb": 2.11,
40+
"size_type": "Compressed"
41+
},
42+
"paper": {
43+
"title": "Event-Based American Sign Language Recognition Using Dynamic Vision Sensor",
44+
"doi": "10.1007/978-3-030-86137-7_1",
45+
"authors": [
46+
"Yong Wang",
47+
"Xian Zhang",
48+
"Yanxiang Wang",
49+
"Hongbin Wang",
50+
"Chanying Huang",
51+
"Yiran Shen"
52+
],
53+
"abstract": "American Sign language (ASL) is one of the most effective communication tools for people with hearing difficulties. However, most of people do not understand ASL. To bridge this gap, we propose EV-ASL, an automatic ASL interpretation system based on dynamic vision sensor (DVS). Compared to the traditional RGB-based approach, DVS consumes significantly less resources (energy, computation, bandwidth) and it outputs the moving objects only without the need of background subtraction due to its event-based nature. At last, because of its wide dynamic response range, it enables the EV-ASL to work under a variety of lighting conditions. EV-ASL proposes novel representation of event streams and facilitates deep convolutional neural network for sign recognition. In order to evaluate the performance of EV-ASL, we recruited 10 participants and collected 11,200 samples from 56 different ASL words. The evaluation shows that EV-ASL achieves a recognition accuracy of 93.25%.\n",
54+
"open_access": false
55+
},
56+
"citation_counts": [
57+
{
58+
"source": "crossref",
59+
"count": 1,
60+
"updated": "2025-09-14T23:04:08.758443"
61+
},
62+
{
63+
"source": "scholar",
64+
"count": 4,
65+
"updated": "2025-09-14T23:04:09.465721"
66+
}
67+
],
68+
"links": [
69+
{
70+
"type": "paper",
71+
"url": "https://link.springer.com/chapter/10.1007/978-3-030-86137-7_1"
72+
},
73+
{
74+
"type": "github_page",
75+
"url": "https://github.com/zhangxiann/EV_ASL/"
76+
}
77+
],
78+
"full_name": "",
79+
"additional_metadata": {
80+
"num_subjects": "10",
81+
"num_males": "6",
82+
"num_females": "4"
83+
},
84+
"referenced_papers": [
85+
{
86+
"doi": "10.1109/CVPRW.2019.00205",
87+
"source": "crossref"
88+
},
89+
{
90+
"doi": "10.1109/CVPR.2017.781",
91+
"source": "crossref"
92+
},
93+
{
94+
"doi": "10.1109/ICCV.2019.00058",
95+
"source": "crossref"
96+
},
97+
{
98+
"doi": "10.1109/TIP.2020.3023597",
99+
"source": "crossref"
100+
},
101+
{
102+
"doi": "10.1109/ICCV.2017.332",
103+
"source": "crossref"
104+
},
105+
{
106+
"doi": "10.1609/aaai.v32i1.11903",
107+
"source": "crossref"
108+
},
109+
{
110+
"doi": "10.1109/TPAMI.2016.2574707",
111+
"source": "crossref"
112+
},
113+
{
114+
"doi": "10.1109/JSSC.2007.914337",
115+
"source": "crossref"
116+
},
117+
{
118+
"doi": "10.3389/fncom.2015.00099",
119+
"source": "crossref"
120+
},
121+
{
122+
"doi": "10.1007/978-3-319-16178-5_40",
123+
"source": "crossref"
124+
},
125+
{
126+
"doi": "10.1109/WACV.2019.00199",
127+
"source": "crossref"
128+
},
129+
{
130+
"doi": "10.1109/CVPR.2019.00652",
131+
"source": "crossref"
132+
},
133+
{
134+
"doi": "10.1109/TPAMI.2021.3054886",
135+
"source": "crossref"
136+
},
137+
{
138+
"doi": "10.15607/RSS.2018.XIV.062",
139+
"source": "crossref"
140+
}
141+
],
142+
"bibtex": {
143+
"pages": "3\u201310",
144+
"year": 2021,
145+
"author": "Wang, Yong and Zhang, Xian and Wang, Yanxiang and Wang, Hongbin and Huang, Chanying and Shen, Yiran",
146+
"publisher": "Springer International Publishing",
147+
"booktitle": "Wireless Algorithms, Systems, and Applications",
148+
"doi": "10.1007/978-3-030-86137-7_1",
149+
"url": "http://dx.doi.org/10.1007/978-3-030-86137-7_1",
150+
"issn": "1611-3349",
151+
"isbn": "9783030861377",
152+
"title": "Event-Based American Sign Language Recognition Using Dynamic Vision Sensor",
153+
"type": "book",
154+
"key": "Wang_2021"
155+
}
156+
}
157+
---
158+
159+
# Dataset Description
160+
161+
To evaluate the recognition accuracy of EV-ASL, a dataset consisting of event-streams when different users are performing ASL words in front of DVS camera was collected. 56 words (26 one-hand words and 30 two-hand words) are included in the dataset. The words are frequent verbs, nouns, adjectives and pronouns, which are commonly used in daily life.
162+
163+
When collecting the dataset, 10 participants (4 females, 6 males) were recruited to perform hands movement corresponding to each of the selected ASL word. Due to the constraints of Human IRB, all the participants have normal hearing ability. They learn the movement according to the ASL words by watching online learning videos for two hours. When doing the experiments, the environment and other conditions were not strictly controlled.
164+
165+
During each experiment session, the participants perform the hands movement of each word for 20 times, so that we collected a total of 11,200 (= 10×56×20) samples.

0 commit comments

Comments
 (0)