Skip to content

Commit fd4b83b

Browse files
author
Greg Cohen
committed
Added a dataset and fixed a spelling mistake in three datasets
1 parent d19b277 commit fd4b83b

File tree

4 files changed

+105
-3
lines changed

4 files changed

+105
-3
lines changed

datasets/CSL-Daily-Event.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212
"other_sensors": [],
1313
"category": "Human-centric Recordings",
1414
"tags": [
15-
"Sign Languge",
15+
"Sign Language",
1616
"Hand Pose Detection"
1717
],
1818
"description": "Sign Language Translation Dataset",

datasets/Event-CSL.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212
"other_sensors": [],
1313
"category": "Human-centric Recordings",
1414
"tags": [
15-
"Sign Languge",
15+
"Sign Language",
1616
"Hand Pose Detection"
1717
],
1818
"description": "Sign Language Translation Dataset",

datasets/EventSTR.md

Lines changed: 102 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,102 @@
1+
---
2+
{
3+
"name": "EventSTR",
4+
"aliases": [],
5+
"year": 2025,
6+
"modalities": [
7+
"Vision"
8+
],
9+
"sensors": [
10+
"Prophesee Gen4"
11+
],
12+
"other_sensors": [],
13+
"category": "Object Detection, Classification, and Tracking",
14+
"tags": [
15+
"Scene Text Recognition"
16+
],
17+
"description": "Visual Text Recognition Dataset",
18+
"dataset_properties": {
19+
"available_online": true,
20+
"has_real_data": true,
21+
"has_simulated_data": false,
22+
"has_ground_truth": true,
23+
"has_frames": true,
24+
"has_biases": true,
25+
"distribution_methods": [
26+
"Dropbox",
27+
"Baidu"
28+
],
29+
"file_formats": [
30+
"Binary"
31+
],
32+
"availability_comment": "",
33+
"dataset_links": [
34+
{
35+
"name": "Baidu",
36+
"url": "https://pan.baidu.com/s/1XN8MfK1cKrqaSOo3e2oD3A?pwd=2l7c",
37+
"format": "Binary",
38+
"available": true
39+
},
40+
{
41+
"name": "Dropbox",
42+
"url": "https://www.dropbox.com/scl/fo/s31llbv7bshz2xj4mf2gm/AFP1AGDcSoY0mk-fcyfL7jw?rlkey=p25w7366lzex7qe3pdgz96ec4&st=afcymd0x&dl=0",
43+
"format": "Binary",
44+
"available": true
45+
}
46+
],
47+
"size_gb": 169.02,
48+
"size_type": "Compressed"
49+
},
50+
"paper": {
51+
"title": "EventSTR: A Benchmark Dataset and Baselines for Event Stream based Scene Text Recognition",
52+
"doi": "10.48550/arXiv.2502.09020",
53+
"authors": [
54+
"Xiao Wang",
55+
"Jingtao Jiang",
56+
"Dong Li",
57+
"Futian Wang",
58+
"Lin Zhu",
59+
"Yaowei Wang",
60+
"Yongyong Tian",
61+
"Jin Tang"
62+
],
63+
"abstract": "Mainstream Scene Text Recognition (STR) algorithms are developed based on RGB cameras which are sensitive to challenging factors such as low illumination, motion blur, and cluttered backgrounds. In this paper, we propose to recognize the scene text using bio-inspired event cameras by collecting and annotating a large-scale benchmark dataset, termed EventSTR. It contains 9,928 high-definition (1280 * 720) event samples and involves both Chinese and English characters. We also benchmark multiple STR algorithms as the baselines for future works to compare. In addition, we propose a new event-based scene text recognition framework, termed SimC-ESTR. It first extracts the event features using a visual encoder and projects them into tokens using a Q-former module. More importantly, we propose to augment the vision tokens based on a memory mechanism before feeding into the large language models. A similarity-based error correction mechanism is embedded within the large language model to correct potential minor errors fundamentally based on contextual information. Extensive experiments on the newly proposed EventSTR dataset and two simulation STR datasets fully demonstrate the effectiveness of our proposed model. We believe that the dataset and algorithmic model can innovatively propose an event-based STR task and are expected to accelerate the application of event cameras in various industries. The source code and pre-trained models will be released on https://github.com/Event-AHU/EventSTR",
64+
"open_access": false
65+
},
66+
"citation_counts": [
67+
{
68+
"source": "scholar",
69+
"count": 1,
70+
"updated": "2025-09-15T08:01:53.956751"
71+
}
72+
],
73+
"links": [
74+
{
75+
"type": "preprint",
76+
"url": "https://arxiv.org/abs/2502.09020"
77+
},
78+
{
79+
"type": "github_page",
80+
"url": "https://github.com/Event-AHU/EventSTR"
81+
}
82+
],
83+
"full_name": "",
84+
"additional_metadata": {},
85+
"bibtex": {
86+
"copyright": "arXiv.org perpetual, non-exclusive license",
87+
"year": 2025,
88+
"publisher": "arXiv",
89+
"title": "EventSTR: A Benchmark Dataset and Baselines for Event Stream based Scene Text Recognition",
90+
"keywords": "Computer Vision and Pattern Recognition (cs.CV), Artificial Intelligence (cs.AI), FOS: Computer and information sciences, FOS: Computer and information sciences",
91+
"author": "Wang, Xiao and Jiang, Jingtao and Li, Dong and Wang, Futian and Zhu, Lin and Wang, Yaowei and Tian, Yongyong and Tang, Jin",
92+
"url": "https://arxiv.org/abs/2502.09020",
93+
"doi": "10.48550/ARXIV.2502.09020",
94+
"type": "misc",
95+
"key": "https://doi.org/10.48550/arxiv.2502.09020"
96+
}
97+
}
98+
---
99+
100+
# Dataset Description
101+
102+
The datasetis an annotated large-scale benchmark dataset, termed EventSTR. It contains 9,928 high-definition (1280 × 720) event samples and involves both Chinese and English characters

datasets/PHOENIX-2014T-Event.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212
"other_sensors": [],
1313
"category": "Human-centric Recordings",
1414
"tags": [
15-
"Sign Languge",
15+
"Sign Language",
1616
"Hand Pose Detection"
1717
],
1818
"description": "Sign Language Translation Dataset",

0 commit comments

Comments
 (0)