|
1 | 1 | # Pose Evaluation |
2 | 2 |
|
3 | | -The lack of automatic pose evaluation metrics is a major obstacle in the development of |
4 | | -sign language generation models. |
| 3 | +This repository provides tools and metrics for the automatic evaluation of pose sequences, |
| 4 | +specifically designed for sign language applications. |
5 | 5 |
|
6 | | - |
7 | | - |
8 | | -## Goals |
9 | | - |
10 | | -The primary objective of this repository is to house a suite of |
11 | | -automatic evaluation metrics specifically tailored for sign language poses. |
12 | | -This includes metrics proposed by Ham2Pose[^1] |
13 | | -as well as custom-developed metrics unique to our approach. |
| 6 | +This includes metrics proposed by Ham2Pose[^1] as well as custom-developed metrics unique to our approach. |
14 | 7 | We recognize the distinct challenges in evaluating single signs versus continuous signing, |
15 | 8 | and our methods reflect this differentiation. |
16 | 9 |
|
17 | | ---- |
| 10 | +<img src="assets/pose-eval-title-picture.png" alt="Distribution of scores" width="500" style="display: block; margin: auto;"/> |
18 | 11 |
|
19 | | -<!-- ## Usage |
| 12 | +## Usage |
20 | 13 |
|
| 14 | +Install the package: |
21 | 15 | ```bash |
22 | | -# (TODO) pip install the package |
23 | | -# (TODO) how to construct a metric |
24 | | -# Metric signatures, preprocessors |
25 | | -``` --> |
26 | | - |
27 | | -## Quantitative Evaluation |
28 | | - |
29 | | -### Isolated Sign Evaluation |
30 | | - |
31 | | -Given an isolated sign corpus such as ASL Citizen[^2], we repeat the evaluation of Ham2Pose[^1] on our metrics, ranking distance metrics by retrieval performance. |
32 | | - |
33 | | -Evaluation is conducted on a combined dataset of ASL Citizen, Sem-Lex[^3], and PopSign ASL[^4]. |
34 | | - |
35 | | -For each sign class, we use all available samples as targets and sample four times as many distractors, yielding a 1:4 target-to-distractor ratio. |
36 | | - |
37 | | -For instance, for the sign _HOUSE_ with 40 samples (11 from ASL Citizen, 29 from Sem-Lex), we add 160 distractors and compute pairwise metrics from each target to all 199 other examples (We consistently discard scores for pose files where either the target or distractor could not be embedded with SignCLIP.). |
38 | | - |
39 | | -Retrieval quality is measured using Mean Average Precision (`mAP↑`) and Precision@10 (`P@10↑`). The complete evaluation covers 5,362 unique sign classes and 82,099 pose sequences. |
40 | | - |
41 | | -After several pilot runs, we finalized a subset of 169 sign classes with at most 20 samples each, ensuring consistent metric coverage. We evaluated 1200 distance-based variants and SignCLIP models with different checkpoints provided by the authors on this subset. |
42 | | - |
43 | | -The overall results show that DTW-based metrics outperform padding-based baselines. Embedding-based methods, particularly SignCLIP models fine-tuned on in-domain ASL data, achieve the strongest retrieval scores. |
| 16 | +pip install git+https://github.com/sign-language-processing/pose-evaluation.git |
| 17 | +``` |
44 | 18 |
|
45 | | -<!-- Atwell style evaluations didn't get done. Nor did AUTSL --> |
| 19 | +Create a metric: |
| 20 | +```python |
| 21 | +from pose_evaluation.metrics.distance_metric import DistanceMetric |
| 22 | +from pose_evaluation.metrics.dtw_metric import DTWDTAIImplementationDistanceMeasure |
| 23 | +from pose_evaluation.metrics.pose_processors import * |
| 24 | + |
| 25 | +DTWp = DistanceMetric( |
| 26 | + name="DTWp", |
| 27 | + # Select distance measure |
| 28 | + distance_measure=DTWDTAIImplementationDistanceMeasure(), |
| 29 | + # Provide pose processing pipeline |
| 30 | + pose_preprocessors=[ |
| 31 | + TrimMeaninglessFramesPoseProcessor(), |
| 32 | + GetHandsOnlyHolisticPoseProcessor(), # select only the hands |
| 33 | + FillMaskedOrInvalidValuesPoseProcessor(masked_fill_value=10.0), # fill masked values with 10.0 |
| 34 | + ReducePosesToCommonComponentsProcessor() # reduce pairs of poses to common components |
| 35 | + ], |
| 36 | +) |
| 37 | +``` |
46 | 38 |
|
47 | | -## Evaluation Metrics |
| 39 | +Evaluate two pose sequences: |
| 40 | +```python |
| 41 | +from pose_format import Pose |
48 | 42 |
|
49 | | -For the study, we evaluated over 1200 Pose distance metrics, recording mAP and other retrieval performance characteristics. |
| 43 | +with open("hypothesis.pose", "rb") as f: |
| 44 | + hypothesis = Pose.read(f) |
| 45 | + |
| 46 | +with open("reference.pose", "rb") as f: |
| 47 | + reference = Pose.read(f) |
50 | 48 |
|
51 | | -We find that the top metric |
| 49 | +DTWp.score(hypothesis, reference) |
| 50 | +``` |
52 | 51 |
|
53 | 52 | ### Contributing |
54 | 53 |
|
55 | | -Please make sure to run `black pose_evaluation` before submitting a pull request. |
| 54 | +Please make sure to run `make format` before submitting a pull request. |
56 | 55 |
|
57 | 56 | ## Cite |
58 | 57 |
|
59 | 58 | If you use our toolkit in your research or projects, please consider citing the work. |
60 | 59 |
|
61 | 60 | ```bib |
62 | | -@misc{pose-evaluation2025, |
63 | | - title={Meaningful Pose-Based Sign Language Evaluation}, |
64 | | - author={Zifan Jiang, Colin Leong, Amit Moryossef, Anne Göhring, Annette Rios, Oliver Cory, Maksym Ivashechkin, Neha Tarigopula, Biao Zhang, Rico Sennrich, Sarah Ebling}, |
65 | | - howpublished={\url{https://github.com/sign-language-processing/pose-evaluation}}, |
66 | | - year={2025} |
| 61 | +@inproceedings{Jiang2025PoseEvaluation, |
| 62 | + title={Meaningful Pose-Based Sign Language Evaluation}, |
| 63 | + author={Zifan Jiang, Colin Leong, Amit Moryossef, Oliver Cory, Maksym Ivashechkin, Neha Tarigopula, Biao Zhang, Anne Göhring, Annette Rios, Rico Sennrich, Sarah Ebling}, |
| 64 | + booktitle={Conference on Machine Translation}, |
| 65 | + year={2025}, |
| 66 | + url={https://github.com/sign-language-processing/pose-evaluation} |
67 | 67 | } |
68 | 68 | ``` |
69 | 69 |
|
|
0 commit comments