Skip to content

Commit aa08728

Browse files
author
Greg Cohen
committed
Cleaning up tags
Also added a few more datasets
1 parent 7531d89 commit aa08728

21 files changed

+801
-23
lines changed

datasets/DET.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414
"tags": [
1515
"Lane Extraction",
1616
"Autonomous Driving",
17-
"Driving"
17+
"Driving Datasets"
1818
],
1919
"description": "Lane Extraction",
2020
"dataset_properties": {

datasets/E-CelebV-HQ.md

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
---
2+
{
3+
"name": "E-CelebV-HQ",
4+
"aliases": [],
5+
"year": 2025,
6+
"modalities": [
7+
"Vision"
8+
],
9+
"sensors": [
10+
"V2E"
11+
],
12+
"other_sensors": [],
13+
"category": "Human-centric Recordings",
14+
"tags": [],
15+
"description": "Facial keypoint alignment dataset",
16+
"dataset_properties": {
17+
"available_online": false,
18+
"has_real_data": false,
19+
"has_simulated_data": true,
20+
"has_ground_truth": true,
21+
"has_frames": true,
22+
"has_biases": false,
23+
"distribution_methods": [],
24+
"file_formats": [],
25+
"availability_comment": "",
26+
"dataset_links": [],
27+
"size_gb": 0.0,
28+
"size_type": "Compressed"
29+
},
30+
"paper": {
31+
"title": "Event-based Facial Keypoint Alignment via Cross-Modal Fusion Attention and Self-Supervised Multi-Event Representation Learning",
32+
"doi": "10.48550/arXiv.2509.24968",
33+
"authors": [
34+
"Donghwa Kang",
35+
"Junho Kim",
36+
"Dongwoo Kang"
37+
],
38+
"abstract": "Event cameras offer unique advantages for facial keypoint alignment under challenging conditions, such as low light and rapid motion, due to their high temporal resolution and robustness to varying illumination. However, existing RGB facial keypoint alignment methods do not perform well on event data, and training solely on event data often leads to suboptimal performance because of its limited spatial information. Moreover, the lack of comprehensive labeled event datasets further hinders progress in this area. To address these issues, we propose a novel framework based on cross-modal fusion attention (CMFA) and self-supervised multi-event representation learning (SSMER) for event-based facial keypoint alignment. Our framework employs CMFA to integrate corresponding RGB data, guiding the model to extract robust facial features from event input images. In parallel, SSMER enables effective feature learning from unlabeled event data, overcoming spatial limitations. Extensive experiments on our real-event E-SIE dataset and a synthetic-event version of the public WFLW-V benchmark show that our approach consistently surpasses state-of-the-art methods across multiple evaluation metrics.",
39+
"open_access": false
40+
},
41+
"citation_counts": [],
42+
"links": [
43+
{
44+
"type": "preprint",
45+
"url": "https://arxiv.org/abs/2509.24968"
46+
}
47+
],
48+
"full_name": "",
49+
"additional_metadata": {
50+
"num_recordings": "35664",
51+
"source_dataset": "CelebV-HQ"
52+
},
53+
"bibtex": {
54+
"copyright": "Creative Commons Attribution 4.0 International",
55+
"year": 2025,
56+
"publisher": "arXiv",
57+
"title": "Event-based Facial Keypoint Alignment via Cross-Modal Fusion Attention and Self-Supervised Multi-Event Representation Learning",
58+
"keywords": "Computer Vision and Pattern Recognition (cs.CV), FOS: Computer and information sciences, FOS: Computer and information sciences",
59+
"author": "Kang, Donghwa and Kim, Junho and Kang, Dongwoo",
60+
"url": "https://arxiv.org/abs/2509.24968",
61+
"doi": "10.48550/ARXIV.2509.24968",
62+
"type": "misc",
63+
"key": "https://doi.org/10.48550/arxiv.2509.24968"
64+
}
65+
}
66+
---
67+
68+
# Dataset Description
69+
70+
The synthetic dataset ECelebV-HQ was constructed to serve as the primary large-scale dataset for training and ablation studies. The v2e event simulator with frame interpolation was employed, with the event threshold parameter set to 0.2. A total of 35,664 event streams were generated from the CelebV-HQ videos and segmented into 25 fps intervals, producing three event representations: frame, voxel, and timesurface. From these representations, it was observed that segments with minimal motion produced very few events. To mitigate this issue and ensure data quality, the single frame with the highest event count from each video was selected. The corresponding RGB frames were then used for annotation, with 98 facial keypoints placed via SLPT and used as the ground-truth labels. The curated data were subsequently divided into 28,531 samples for training, 713 for validation, and 6,420 for testing. To obtain a high-confidence evaluation subset from the test split, the pseudo labels were manually verified, resulting in a final set of 1,554 retained images.

datasets/E-SIE.md

Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
---
2+
{
3+
"name": "E-SIE",
4+
"aliases": [],
5+
"year": 2025,
6+
"modalities": [
7+
"Vision"
8+
],
9+
"sensors": [
10+
"V2E"
11+
],
12+
"other_sensors": [],
13+
"category": "Human-centric Recordings",
14+
"tags": [],
15+
"description": "Facial keypoint alignment dataset",
16+
"dataset_properties": {
17+
"available_online": false,
18+
"has_real_data": true,
19+
"has_simulated_data": false,
20+
"has_ground_truth": true,
21+
"has_frames": true,
22+
"has_biases": false,
23+
"distribution_methods": [],
24+
"file_formats": [],
25+
"availability_comment": "",
26+
"dataset_links": [],
27+
"size_gb": 0.0,
28+
"size_type": "Compressed"
29+
},
30+
"paper": {
31+
"title": "Event-based Facial Keypoint Alignment via Cross-Modal Fusion Attention and Self-Supervised Multi-Event Representation Learning",
32+
"doi": "10.48550/arXiv.2509.24968",
33+
"authors": [
34+
"Donghwa Kang",
35+
"Junho Kim",
36+
"Dongwoo Kang"
37+
],
38+
"abstract": "Event cameras offer unique advantages for facial keypoint alignment under challenging conditions, such as low light and rapid motion, due to their high temporal resolution and robustness to varying illumination. However, existing RGB facial keypoint alignment methods do not perform well on event data, and training solely on event data often leads to suboptimal performance because of its limited spatial information. Moreover, the lack of comprehensive labeled event datasets further hinders progress in this area. To address these issues, we propose a novel framework based on cross-modal fusion attention (CMFA) and self-supervised multi-event representation learning (SSMER) for event-based facial keypoint alignment. Our framework employs CMFA to integrate corresponding RGB data, guiding the model to extract robust facial features from event input images. In parallel, SSMER enables effective feature learning from unlabeled event data, overcoming spatial limitations. Extensive experiments on our real-event E-SIE dataset and a synthetic-event version of the public WFLW-V benchmark show that our approach consistently surpasses state-of-the-art methods across multiple evaluation metrics.",
39+
"open_access": false
40+
},
41+
"citation_counts": [],
42+
"links": [
43+
{
44+
"type": "preprint",
45+
"url": "https://arxiv.org/abs/2509.24968"
46+
}
47+
],
48+
"full_name": "",
49+
"additional_metadata": {
50+
"num_subjects": "9"
51+
},
52+
"bibtex": {
53+
"copyright": "Creative Commons Attribution 4.0 International",
54+
"year": 2025,
55+
"publisher": "arXiv",
56+
"title": "Event-based Facial Keypoint Alignment via Cross-Modal Fusion Attention and Self-Supervised Multi-Event Representation Learning",
57+
"keywords": "Computer Vision and Pattern Recognition (cs.CV), FOS: Computer and information sciences, FOS: Computer and information sciences",
58+
"author": "Kang, Donghwa and Kim, Junho and Kang, Dongwoo",
59+
"url": "https://arxiv.org/abs/2509.24968",
60+
"doi": "10.48550/ARXIV.2509.24968",
61+
"type": "misc",
62+
"key": "https://doi.org/10.48550/arxiv.2509.24968"
63+
}
64+
}
65+
---
66+
67+
# Dataset Description
68+
69+
E-SIE is a real dataset collected for the research undertaken in the paper and comprises nine volunteers (aged 21 to 43, all of Asian descent) who continuously moved their heads. Illumination ranges from 30 to 120 lux among the three primary factors. Each subject participated in both eyeglasses-on and off conditions, with head speed either normal or fast. Head pose variation includes horizontal and vertical translations. These combined factors yield 6 distinct scenarios, producing a total of 720 images exclusively used for evaluation.

datasets/E-WFLW-V.md

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
---
2+
{
3+
"name": "E-WFLW-V",
4+
"aliases": [],
5+
"year": 2025,
6+
"modalities": [
7+
"Vision"
8+
],
9+
"sensors": [
10+
"V2E"
11+
],
12+
"other_sensors": [],
13+
"category": "Human-centric Recordings",
14+
"tags": [],
15+
"description": "Facial keypoint alignment dataset",
16+
"dataset_properties": {
17+
"available_online": false,
18+
"has_real_data": false,
19+
"has_simulated_data": true,
20+
"has_ground_truth": true,
21+
"has_frames": true,
22+
"has_biases": false,
23+
"distribution_methods": [],
24+
"file_formats": [],
25+
"availability_comment": "",
26+
"dataset_links": [],
27+
"size_gb": 0.0,
28+
"size_type": "Compressed"
29+
},
30+
"paper": {
31+
"title": "Event-based Facial Keypoint Alignment via Cross-Modal Fusion Attention and Self-Supervised Multi-Event Representation Learning",
32+
"doi": "10.48550/arXiv.2509.24968",
33+
"authors": [
34+
"Donghwa Kang",
35+
"Junho Kim",
36+
"Dongwoo Kang"
37+
],
38+
"abstract": "Event cameras offer unique advantages for facial keypoint alignment under challenging conditions, such as low light and rapid motion, due to their high temporal resolution and robustness to varying illumination. However, existing RGB facial keypoint alignment methods do not perform well on event data, and training solely on event data often leads to suboptimal performance because of its limited spatial information. Moreover, the lack of comprehensive labeled event datasets further hinders progress in this area. To address these issues, we propose a novel framework based on cross-modal fusion attention (CMFA) and self-supervised multi-event representation learning (SSMER) for event-based facial keypoint alignment. Our framework employs CMFA to integrate corresponding RGB data, guiding the model to extract robust facial features from event input images. In parallel, SSMER enables effective feature learning from unlabeled event data, overcoming spatial limitations. Extensive experiments on our real-event E-SIE dataset and a synthetic-event version of the public WFLW-V benchmark show that our approach consistently surpasses state-of-the-art methods across multiple evaluation metrics.",
39+
"open_access": false
40+
},
41+
"citation_counts": [],
42+
"links": [
43+
{
44+
"type": "preprint",
45+
"url": "https://arxiv.org/abs/2509.24968"
46+
}
47+
],
48+
"full_name": "",
49+
"additional_metadata": {
50+
"source_dataset": "E-WFLW-V",
51+
"num_recordings": "1000"
52+
},
53+
"bibtex": {
54+
"copyright": "Creative Commons Attribution 4.0 International",
55+
"year": 2025,
56+
"publisher": "arXiv",
57+
"title": "Event-based Facial Keypoint Alignment via Cross-Modal Fusion Attention and Self-Supervised Multi-Event Representation Learning",
58+
"keywords": "Computer Vision and Pattern Recognition (cs.CV), FOS: Computer and information sciences, FOS: Computer and information sciences",
59+
"author": "Kang, Donghwa and Kim, Junho and Kang, Dongwoo",
60+
"url": "https://arxiv.org/abs/2509.24968",
61+
"doi": "10.48550/ARXIV.2509.24968",
62+
"type": "misc",
63+
"key": "https://doi.org/10.48550/arxiv.2509.24968"
64+
}
65+
}
66+
---
67+
68+
# Dataset Description
69+
70+
E-WFLW-V was created as a synthetic event-based version of the public WFLW-V benchmark . A total of 1,000 RGB clips from the original dataset were converted into event streams using the same simulator settings as those applied to E-CelebV-HQ. Each event stream was subdivided into segments matching the original RGB frame rate. From these segments, the single segment with the highest event count was selected for each clip. This process produced a final test set of 1,000 representative event images, each annotated with 98 ground-truth landmark labels.

datasets/EBSSA.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,8 @@
1313
"other_sensors": [],
1414
"category": "Object Detection, Classification, and Tracking",
1515
"tags": [
16-
"Space"
16+
"Space Datasets",
17+
"Space Situational Awareness"
1718
],
1819
"description": "Object Detection and Tracking for SSA",
1920
"dataset_properties": {

datasets/EKLT-VIO Datasets.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,9 @@
1212
"other_sensors": [],
1313
"category": "Visual Navigation",
1414
"tags": [
15-
"Space",
16-
"Visual Odometry"
15+
"Space Datasets",
16+
"Visual Odometry",
17+
"Mars/Moon Datasets"
1718
],
1819
"description": "Visual Odometry",
1920
"dataset_properties": {
@@ -53,7 +54,7 @@
5354
"Jeff Delaune",
5455
"Davide Scaramuzza"
5556
],
56-
"abstract": "Due to their resilience to motion blur and high robustness in low-light and high dynamic range conditions, event cameras are poised to become enabling sensors for vision-based exploration on future Mars helicopter missions. However, existing event-based visual-inertial odometry (VIO) algorithms either suffer from high tracking errors or are brittle, since they cannot cope with signi\ufb01cant depth uncertainties caused by an unforeseen loss of tracking or other effects. In this work, we introduce EKLT-VIO, which addresses both limitations by combining a state-of-the-art event-based frontend with a \ufb01lter-based backend. This makes it both accurate and robust to uncertainties, outperforming eventand frame-based VIO algorithms on challenging benchmarks by 32\\%. In addition, we demonstrate accurate performance in hoverlike conditions (outperforming existing event-based methods) as well as high robustness in newly collected Mars-like and highdynamic-range sequences, where existing frame-based methods fail. In doing so, we show that event-based VIO is the way forward for vision-based exploration on Mars.",
57+
"abstract": "Due to their resilience to motion blur and high robustness in low-light and high dynamic range conditions, event cameras are poised to become enabling sensors for vision-based exploration on future Mars helicopter missions. However, existing event-based visual-inertial odometry (VIO) algorithms either suffer from high tracking errors or are brittle, since they cannot cope with signi\ufb01cant depth uncertainties caused by an unforeseen loss of tracking or other effects. In this work, we introduce EKLT-VIO, which addresses both limitations by combining a state-of-the-art event-based frontend with a \ufb01lter-based backend. This makes it both accurate and robust to uncertainties, outperforming event and frame-based VIO algorithms on challenging benchmarks by 32\\%. In addition, we demonstrate accurate performance in hover-like conditions (outperforming existing event-based methods) as well as high robustness in newly collected Mars-like and high dynamic-range sequences, where existing frame-based methods fail. In doing so, we show that event-based VIO is the way forward for vision-based exploration on Mars.",
5758
"open_access": false
5859
},
5960
"citation_counts": [

datasets/Ev-PointOdyssey.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212
"other_sensors": [],
1313
"category": "Object Detection, Classification, and Tracking",
1414
"tags": [
15-
"TAP"
15+
"Track Any Point (TAP)"
1616
],
1717
"description": "Tracking Any Point",
1818
"dataset_properties": {

datasets/EventKubric.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@
1313
"category": "Object Detection, Classification, and Tracking",
1414
"tags": [
1515
"Object Tracking",
16-
"Track Any Point"
16+
"Track Any Point (TAP)"
1717
],
1818
"description": "Tracking Any Point",
1919
"dataset_properties": {

datasets/Gentil2024.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@
1313
"other_sensors": [],
1414
"category": "Domain Specific Application",
1515
"tags": [
16-
"Space",
16+
"Space Datasets",
1717
"Satellite Docking"
1818
],
1919
"description": "Satellite Docking",

0 commit comments

Comments
 (0)