Skip to content

Commit 5e68143

Browse files
Leooo-Huangclaude
andcommitted
feat: upgrade README to full-field HAR guide for 10k star potential
Expanded from datasets-only catalog to comprehensive HAR resource: - Added "Which Dataset Should I Use" quick decision guide - Added Frameworks & Libraries (MMAction2, PySlowFast, ST-GCN, tsai, MDM, etc.) - Added Pretrained Models (VideoMAE V2, InternVideo2, UniFormerV2) - Added Tutorials & Courses (D2L, Stanford CS231N, sensor HAR) - Added Key Papers organized by era (foundational, transformer, wearable, motion gen) - Added Competitions & Challenges (Ego-Exo4D 2025, ActivityNet, EPIC-Kitchens) - Added Tools & Utilities (Papers with Code, Decord, OpenPose, MediaPipe) - Added Related Awesome Lists for cross-discovery - Passes awesome-lint with zero errors Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 36e279d commit 5e68143

1 file changed

Lines changed: 152 additions & 12 deletions

File tree

README.md

Lines changed: 152 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,44 @@
11
# Awesome Human Activity Recognition [![Awesome](https://awesome.re/badge.svg)](https://awesome.re)
22

3-
> A curated list of 53 Human Activity Recognition (HAR), action recognition, motion capture, and pose estimation datasets — with licensing, benchmarks, SOTA leaderboards, and download instructions.
3+
> A curated, researcher-driven guide to **Human Activity Recognition** — 53 datasets, key frameworks, pretrained models, tutorials, and benchmark tools across vision, wearable, skeleton, and multimodal modalities.
44
55
[![License: CC BY 4.0](https://img.shields.io/badge/License-CC_BY_4.0-lightgrey.svg)](https://creativecommons.org/licenses/by/4.0/)
6+
[![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](https://github.com/Leo-Cyberautonomy/awesome-human-activity-recognition/pulls)
7+
[![Last Updated](https://img.shields.io/badge/Updated-March_2026-blue.svg)](#)
68

79
**[中文](i18n/README.zh.md)** | [Deutsch](i18n/README.de.md) | [Español](i18n/README.es.md) | [Français](i18n/README.fr.md) | [日本語](i18n/README.ja.md) | [한국어](i18n/README.ko.md) | [Português](i18n/README.pt.md) | [Русский](i18n/README.ru.md)
810

911
## Contents
1012

11-
- [Vision (RGB / Depth)](#vision-rgb--depth)
12-
- [Skeleton and Mocap](#skeleton-and-mocap)
13-
- [Wearable Sensors](#wearable-sensors)
14-
- [Multimodal and Egocentric](#multimodal-and-egocentric)
15-
- [Emerging and Frontier](#emerging-and-frontier)
13+
- [Which Dataset Should I Use](#which-dataset-should-i-use)
14+
- [Datasets](#datasets)
15+
- [Frameworks and Libraries](#frameworks-and-libraries)
16+
- [Pretrained Models](#pretrained-models)
17+
- [Tutorials and Courses](#tutorials-and-courses)
18+
- [Key Papers](#key-papers)
19+
- [Competitions and Challenges](#competitions-and-challenges)
20+
- [Tools and Utilities](#tools-and-utilities)
21+
- [Related Awesome Lists](#related-awesome-lists)
1622

17-
## Vision (RGB / Depth)
23+
## Which Dataset Should I Use
24+
25+
> Pick your modality and task, then follow the recommendation to the right section.
26+
27+
**I have video and want to classify actions** — Start with Kinetics-700 for pretraining, evaluate on UCF-101 or HMDB-51 for comparison with prior work. See [Vision](#vision-rgb--depth).
28+
29+
**I need temporal action detection in untrimmed video** — ActivityNet for proposals, AVA for spatio-temporal, MultiTHUMOS for dense multi-label. Also listed under Vision above.
30+
31+
**I work with skeleton or motion capture data** — NTU RGB+D 120 is the de facto standard. For text-motion alignment, use Babel or HumanML3D. See [Skeleton](#skeleton-and-mocap) and [Emerging](#emerging-and-frontier).
32+
33+
**I have IMU or wearable sensor data** — UCI-HAR for baselines, PAMAP2 for multi-sensor, CAPTURE-24 for real-world scale (151 subjects, 3883 hours). See [Wearable](#wearable-sensors).
34+
35+
**I need egocentric or multimodal data** — Ego4D for scale (3.3k hours), EPIC-Kitchens-100 for kitchen actions, Ego-Exo4D for cross-view (NEW, CVPR 2024). See [Multimodal](#multimodal-and-egocentric).
36+
37+
**I want text-to-motion generation** — HumanML3D for single-person, InterHuman for two-person, Motion-X++ for whole-body with face and hands. Also listed under Emerging above.
38+
39+
## Datasets
40+
41+
### Vision (RGB / Depth)
1842

1943
- [Kinetics-700](https://deepmind.com/research/open-source/kinetics) - Large-scale pretraining benchmark with 650k YouTube clips across 700 action classes.
2044
- [UCF-101](https://www.crcv.ucf.edu/data/UCF101.php) - Classic action recognition benchmark with 13.3k clips across 101 classes.
@@ -31,7 +55,7 @@
3155
- [MultiTHUMOS](https://ai.stanford.edu/~syyeung/everymoment.html) - Dense multi-label temporal action detection with 65 classes and 38k annotations.
3256
- [FineSports](https://github.com/PKU-ICST-MIPL/FineSports_CVPR2024) - Multi-person fine-grained sports understanding with 10k NBA videos and 52 action types from CVPR 2024.
3357

34-
## Skeleton and Mocap
58+
### Skeleton and Mocap
3559

3660
- [NTU RGB+D 60](https://rose1.ntu.edu.sg/dataset/actionRecognition/) - Foundation dataset for skeleton-based action recognition with 57k sequences across 60 classes.
3761
- [AMASS](https://amass.is.tue.mpg.de/) - Unified SMPL motion capture parameters from 40+ datasets covering 16k minutes and 344 subjects.
@@ -41,7 +65,7 @@
4165
- [PKU-MMD](https://www.icst.pku.edu.cn/struct/Projects/PKUMMD.html) - Multi-modality action detection benchmark with 20k instances across 51 classes.
4266
- [Skeletics-152](https://github.com/skelemoa/quater-gcn) - Large-scale skeleton action recognition from estimated poses with 150k clips across 152 classes.
4367

44-
## Wearable Sensors
68+
### Wearable Sensors
4569

4670
- [UCI-HAR](https://archive.ics.uci.edu/ml/datasets/human+activity+recognition+using+smartphones) - Classic smartphone IMU benchmark with 30 subjects and 6 activities, near-saturated.
4771
- [PAMAP2](https://archive.ics.uci.edu/ml/datasets/pamap2+physical+activity+monitoring) - Wearable HAR standard with multi-IMU and heart rate from 9 subjects across 18 activities.
@@ -57,7 +81,7 @@
5781
- [CAPTURE-24](https://github.com/OxWearables/capture24) - Largest free-living wrist accelerometer dataset with 151 subjects and 3883 hours from Nature Scientific Data 2024.
5882
- [WEAR](https://github.com/mariusbock/wear) - Outdoor sports dataset with smartwatch IMU and egocentric video from 22 subjects across 18 activities, published at IMWUT 2024.
5983

60-
## Multimodal and Egocentric
84+
### Multimodal and Egocentric
6185

6286
- [EPIC-Kitchens-100](https://epic-kitchens.github.io/2021) - Long-term egocentric kitchen actions with audio spanning 700 hours across 90 kitchens.
6387
- [Ego4D](https://ego4d-data.org/docs/data/) - Largest egocentric dataset with multi-task benchmarks spanning 3.3k hours across 74 scenes.
@@ -67,7 +91,7 @@
6791
- [How2Sign](https://how2sign.github.io/) - Multimodal American Sign Language dataset with RGB, depth, and pose spanning 80 hours.
6892
- [EgoExo-Fitness](https://github.com/iSEE-Laboratory/EgoExo-Fitness) - Ego and exo fitness action quality assessment with 31 hours and 6k+ actions from ECCV 2024.
6993

70-
## Emerging and Frontier
94+
### Emerging and Frontier
7195

7296
- [BEHAVE](https://virtualhumans.mpi-inf.mpg.de/behave/) - RGB-D human-object interaction with 3D pose spanning 321 sequences from 20 subjects.
7397
- [Motion-X](https://caizhongang.github.io/projects/Motion-X/) - Full-body and hand joint motion from multisensor mocap with 2M frames from 10 subjects.
@@ -82,9 +106,125 @@
82106
- [InterX](https://liangxuy.github.io/inter-x/) - Comprehensive human-human interaction dataset with SMPL-X spanning 11k+ sequences from CVPR 2024.
83107
- [WiMANS](https://arxiv.org/abs/2402.09430) - First WiFi-based multi-user activity sensing benchmark at a top venue from ECCV 2024.
84108

109+
## Frameworks and Libraries
110+
111+
### Video Action Recognition
112+
113+
- [MMAction2](https://github.com/open-mmlab/mmaction2) - OpenMMLab toolbox for video understanding supporting 20+ model architectures including SlowFast, TimeSformer, and VideoMAE.
114+
- [PySlowFast](https://github.com/facebookresearch/SlowFast) - Facebook Research library for video understanding with SlowFast, X3D, MViT, and AVA models.
115+
- [Video-Swin-Transformer](https://github.com/SwinTransformer/Video-Swin-Transformer) - Pure-transformer backbone for video recognition achieving SOTA on Kinetics-400, Kinetics-600, and SSv2.
116+
- [TimeSformer](https://github.com/facebookresearch/TimeSformer) - Facebook Research divided space-time attention for video classification from ICML 2021.
117+
- [VideoMAE](https://github.com/MCG-NJU/VideoMAE) - Self-supervised video pretraining with masked autoencoders achieving SOTA on multiple benchmarks.
118+
- [InternVideo2](https://github.com/OpenGVLab/InternVideo2) - Foundation model for video understanding at scale supporting action recognition, retrieval, and captioning.
119+
120+
### Skeleton Action Recognition
121+
122+
- [CTR-GCN](https://github.com/Uason-Chen/CTR-GCN) - Channel-wise topology refinement graph convolution for skeleton-based action recognition from ICCV 2021.
123+
- [ST-GCN](https://github.com/yysijie/st-gcn) - Seminal spatial-temporal graph convolution network that established the GCN approach for skeleton-based HAR.
124+
- [2s-AGCN](https://github.com/lshiwjx/2s-AGCN) - Two-stream adaptive graph convolutional network for skeleton-based action recognition from CVPR 2019.
125+
- [HD-GCN](https://github.com/Jho-Yonsei/HD-GCN) - Hierarchically decomposed graph convolutional network for skeleton action recognition from AAAI 2024.
126+
- [MotionBERT](https://github.com/Walter0807/MotionBERT) - Unified pretraining for human motion analysis covering 3D pose estimation and action recognition.
127+
- [InfoGCN](https://github.com/stnoah1/infogcn) - Information-bottleneck graph convolutional network for skeleton action recognition from CVPR 2022.
128+
129+
### Wearable Sensor HAR
130+
131+
- [tsai](https://github.com/timeseriesAI/tsai) - Deep learning library for time series and sequences built on fastai and PyTorch, widely used for sensor HAR.
132+
- [aeon](https://github.com/aeon-toolkit/aeon) - Unified Python toolkit for time series including classification, clustering, and anomaly detection.
133+
- [NNCLR-HAR](https://github.com/mariusbock/nnclr-har) - Self-supervised contrastive learning framework for wearable sensor HAR from IMWUT 2022.
134+
- [DeepConvLSTM](https://github.com/sussexwearlab/DeepConvLSTM) - Reference implementation of the convolutional LSTM architecture for wearable activity recognition.
135+
- [Hang-Time HAR](https://github.com/ahoelzemann/hangtime_har) - Basketball activity recognition from a single wrist-worn inertial sensor using deep learning.
136+
137+
### Motion Generation and Estimation
138+
139+
- [MDM](https://github.com/GuyTevet/motion-diffusion-model) - Human motion diffusion model for text-to-motion generation achieving SOTA on HumanML3D.
140+
- [MLD](https://github.com/ChenFengYe/motion-latent-diffusion) - Motion latent diffusion model for efficient text-driven human motion generation from CVPR 2023.
141+
- [T2M-GPT](https://github.com/Mael-zys/T2M-GPT) - Generating human motion from textual descriptions with discrete representations.
142+
- [MotionGPT](https://github.com/OpenMotionLab/MotionGPT) - Unified motion-language generation model treating motion as a foreign language.
143+
- [SMPL-X](https://github.com/vchoutas/smplx) - Expressive body model capturing body, face, and hand poses, the standard for modern motion datasets.
144+
145+
## Pretrained Models
146+
147+
- [VideoMAE V2](https://github.com/OpenGVLab/VideoMAEv2) - Billion-parameter video foundation model pretrained on millions of clips, finetunable for action recognition.
148+
- [InternVideo2 Model Zoo](https://huggingface.co/OpenGVLab/InternVideo2-Stage2_1B-224p-f4) - 6B-parameter video-language model checkpoints on Hugging Face for action recognition and retrieval.
149+
- [UniFormerV2](https://github.com/OpenGVLab/UniFormerV2) - Efficient video transformer with multi-scale tokens achieving 90.0% top-1 on Kinetics-400.
150+
- [MVD](https://github.com/ruiwang2021/mvd) - Masked video distillation pretrained model competitive with VideoMAE on downstream action recognition.
151+
- [MotionBERT Checkpoints](https://huggingface.co/walterzhu/MotionBERT) - Pretrained motion encoder transferable to 3D pose estimation, action recognition, and mesh recovery.
152+
153+
## Tutorials and Courses
154+
155+
- [Dive into Deep Learning - Action Recognition](https://d2l.ai/) - Interactive textbook chapter on video understanding and action recognition with PyTorch code.
156+
- [MMAction2 Tutorials](https://mmaction2.readthedocs.io/en/latest/get_started/overview.html) - Step-by-step guide to training action recognition models on custom datasets.
157+
- [Sensor HAR Tutorial by Marius Bock](https://github.com/mariusbock/dl-for-har) - Comprehensive deep learning tutorial for inertial sensor HAR with PyTorch.
158+
- [Stanford CS231N - Video Understanding](https://cs231n.stanford.edu/) - Lecture materials covering temporal modeling, two-stream networks, and 3D convolutions for action recognition.
159+
- [Coursera - Motion Planning](https://www.coursera.org/learn/robotics-motion-planning) - University of Pennsylvania course covering motion representations relevant to HAR.
160+
- [Motion Diffusion Tutorial](https://colab.research.google.com/drive/1MvBaAhOrEk8MP_jwNdQKLnvMxXPOG6zU) - Colab notebook for training text-conditioned human motion diffusion models on HumanML3D.
161+
162+
## Key Papers
163+
164+
### Foundational
165+
166+
- [Two-Stream Convolutional Networks](https://arxiv.org/abs/1406.2199) - Simonyan and Zisserman, NeurIPS 2014, establishing the spatial-temporal two-stream paradigm.
167+
- [C3D: Learning Spatiotemporal Features](https://arxiv.org/abs/1412.0767) - Tran et al., ICCV 2015, pioneering 3D convolutions for video feature learning.
168+
- [I3D: Quo Vadis Action Recognition](https://arxiv.org/abs/1705.07750) - Carreira and Zisserman, CVPR 2017, inflating 2D ImageNet architectures to 3D video.
169+
- [ST-GCN: Spatial Temporal Graph Convolutional Networks](https://arxiv.org/abs/1801.07455) - Yan et al., AAAI 2018, defining the GCN approach for skeleton action recognition.
170+
- [SlowFast Networks](https://arxiv.org/abs/1812.03982) - Feichtenhofer et al., ICCV 2019, dual-pathway architecture for video recognition.
171+
172+
### Transformer Era (2020 onwards)
173+
174+
- [ViViT: A Video Vision Transformer](https://arxiv.org/abs/2103.15691) - Arnab et al., ICCV 2021, pure-transformer models for video classification.
175+
- [TimeSformer](https://arxiv.org/abs/2102.05095) - Bertasius et al., ICML 2021, divided space-time attention for scalable video transformers.
176+
- [VideoMAE](https://arxiv.org/abs/2203.12602) - Tong et al., NeurIPS 2022, masked autoencoder pretraining achieving SOTA with minimal labeled data.
177+
- [InternVideo2](https://arxiv.org/abs/2403.15377) - Wang et al., ECCV 2024, scaling video foundation models to 6B parameters across 60+ benchmarks.
178+
179+
### Wearable and Sensor HAR
180+
181+
- [DeepConvLSTM](https://arxiv.org/abs/1611.06759) - Ordonez and Roggen, Sensors 2016, establishing deep learning for wearable activity recognition.
182+
- [Attend and Discriminate](https://arxiv.org/abs/2007.07426) - Abedin et al., IMWUT 2021, attention mechanisms for multi-sensor HAR.
183+
- [Self-supervised HAR](https://arxiv.org/abs/2011.11542) - Tang et al., IJCAI 2021, contrastive learning for sensor-based activity recognition.
184+
185+
### Motion Generation
186+
187+
- [MDM: Human Motion Diffusion Model](https://arxiv.org/abs/2209.14916) - Tevet et al., ICLR 2023, diffusion-based text-to-motion generation.
188+
- [MotionGPT](https://arxiv.org/abs/2306.14795) - Jiang et al., NeurIPS 2023, unifying motion and language through LLM architectures.
189+
- [Motion-X](https://arxiv.org/abs/2307.00818) - Lin et al., NeurIPS 2023, first large-scale whole-body motion dataset with expressive annotations.
190+
191+
### Surveys
192+
193+
- [Deep Learning for HAR: A Survey](https://dl.acm.org/doi/10.1145/3472290) - Li et al., ACM Computing Surveys 2022, comprehensive review of deep learning approaches for HAR.
194+
- [Skeleton-based Action Recognition Survey](https://arxiv.org/abs/2012.12231) - Liu et al., IEEE TPAMI 2022, in-depth review of GCN and transformer methods for skeleton HAR.
195+
- [Multimodal HAR with Emphasis on Classification](https://www.sciencedirect.com/science/article/pii/S0950705124000029) - Yadav et al., Knowledge-Based Systems 2024, latest survey covering fusion strategies.
196+
197+
## Competitions and Challenges
198+
199+
- [Ego-Exo4D Challenge 2025](https://eval.ai/web/challenges/challenge-page/2249/overview) - CVPR 2025 multi-track benchmark covering ego-pose, action recognition, and language understanding.
200+
- [ActivityNet Challenge](http://activity-net.org/challenges/2024/) - Annual challenge for temporal action detection, proposals, and dense captioning.
201+
- [EPIC-Kitchens Challenge](https://epic-kitchens.github.io/2024) - Egocentric action recognition, detection, and anticipation competition.
202+
- [SHL Recognition Challenge](http://www.shl-dataset.org/activity-recognition-challenge/) - Annual challenge for transportation mode recognition from smartphone sensors.
203+
- [Babel Challenge](https://teach.is.tue.mpg.de/) - Motion-language understanding and temporal action segmentation on mocap data.
204+
- [UAV-Human Challenge](https://github.com/SUTDCV/UAV-Human) - Human behavior understanding from UAV perspectives with multi-modal data.
205+
206+
## Tools and Utilities
207+
208+
- [Papers with Code - HAR Leaderboards](https://paperswithcode.com/task/activity-recognition) - Live SOTA tracking across all major HAR benchmarks.
209+
- [MMAction2 Model Zoo](https://mmaction2.readthedocs.io/en/latest/model_zoo/modelzoo.html) - Pretrained checkpoints and configs for 100+ action recognition models.
210+
- [Decord](https://github.com/dmlc/decord) - Efficient GPU-accelerated video reader for deep learning training pipelines.
211+
- [vid2player](https://github.com/jhgan00/vid2player) - Character animation from video input, useful for activity recognition visualization.
212+
- [OpenPose](https://github.com/CMU-Perceptual-Computing-Lab/openpose) - Real-time multi-person keypoint detection for skeleton extraction from video.
213+
- [MediaPipe](https://developers.google.com/mediapipe) - Google's on-device ML framework for pose estimation, hand tracking, and gesture recognition.
214+
- [YOLO-Pose](https://github.com/ultralytics/ultralytics) - Ultralytics YOLOv8 Pose for real-time multi-person skeleton estimation.
215+
216+
## Related Awesome Lists
217+
218+
- [Awesome Action Recognition](https://github.com/jinwchoi/awesome-action-recognition) - Action recognition papers and datasets.
219+
- [Awesome Skeleton-based Action Recognition](https://github.com/firework8/Awesome-Skeleton-based-Action-Recognition) - GCN and transformer methods for skeleton HAR.
220+
- [Awesome Self-Supervised Learning](https://github.com/jason718/awesome-self-supervised-learning) - Self-supervised learning methods applicable to video and sensor modalities.
221+
- [Awesome Video Understanding](https://github.com/HuaizhengZhang/Awesome-System-for-Machine-Learning) - Video understanding systems and architectures.
222+
- [Awesome IMU Sensing](https://github.com/rh20624/Awesome-IMU-Sensing) - IMU-based sensing for activity recognition and navigation.
223+
- [Awesome Pose Estimation](https://github.com/cbsudux/awesome-human-pose-estimation) - Human pose estimation methods and benchmarks.
224+
85225
## Footnotes
86226

87-
See also: [Multi-dimensional taxonomy](docs/taxonomy.md) | [Surveys](docs/surveys.md) | [Benchmarks](docs/benchmarking.md) | [Catalog builder](tools/) | [Roadmap](docs/roadmap.md) | [Contributing](CONTRIBUTING.md)
227+
See also: [Multi-dimensional taxonomy](docs/taxonomy.md) | [Surveys](docs/surveys.md) | [Benchmarks](docs/benchmarking.md) | [Catalog builder](tools/) | [Roadmap](docs/roadmap.md) | [How to contribute](CONTRIBUTING.md)
88228

89229
### Citation
90230

0 commit comments

Comments
 (0)