Skip to content

Commit bd03820

Browse files
committed
docs(README): add usage example
1 parent cbde2dd commit bd03820

File tree

7 files changed

+48
-77
lines changed

7 files changed

+48
-77
lines changed

.github/workflows/lint.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ jobs:
2424
run: pip install ruff
2525

2626
- name: Run Ruff Checks (Linting)
27-
run: ruff check pose_evaluation
27+
run: ruff check .
2828

2929
- name: Run Ruff Format Check
3030
# The --check flag makes 'ruff format' exit with a non-zero code if files are not formatted,

.gitignore

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,4 +22,5 @@ coverage.lcov
2222
**/debug*/*
2323
*.tar.zst
2424
*.zip
25-
coverage.xml
25+
coverage.xml
26+
uv.lock

.pre-commit-config.yaml

Lines changed: 0 additions & 20 deletions
This file was deleted.

Makefile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
.PHONY: format
22

33
format:
4+
black pose_evaluation
45
python -m ruff format .
56
python -m ruff check --fix .

README.md

Lines changed: 44 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -1,69 +1,69 @@
11
# Pose Evaluation
22

3-
The lack of automatic pose evaluation metrics is a major obstacle in the development of
4-
sign language generation models.
3+
This repository provides tools and metrics for the automatic evaluation of pose sequences,
4+
specifically designed for sign language applications.
55

6-
![Distribution of scores](assets/pose-eval-title-picture.png)
7-
8-
## Goals
9-
10-
The primary objective of this repository is to house a suite of
11-
automatic evaluation metrics specifically tailored for sign language poses.
12-
This includes metrics proposed by Ham2Pose[^1]
13-
as well as custom-developed metrics unique to our approach.
6+
This includes metrics proposed by Ham2Pose[^1] as well as custom-developed metrics unique to our approach.
147
We recognize the distinct challenges in evaluating single signs versus continuous signing,
158
and our methods reflect this differentiation.
169

17-
---
10+
<img src="assets/pose-eval-title-picture.png" alt="Distribution of scores" width="500" style="display: block; margin: auto;"/>
1811

19-
<!-- ## Usage
12+
## Usage
2013

14+
Install the package:
2115
```bash
22-
# (TODO) pip install the package
23-
# (TODO) how to construct a metric
24-
# Metric signatures, preprocessors
25-
``` -->
26-
27-
## Quantitative Evaluation
28-
29-
### Isolated Sign Evaluation
30-
31-
Given an isolated sign corpus such as ASL Citizen[^2], we repeat the evaluation of Ham2Pose[^1] on our metrics, ranking distance metrics by retrieval performance.
32-
33-
Evaluation is conducted on a combined dataset of ASL Citizen, Sem-Lex[^3], and PopSign ASL[^4].
34-
35-
For each sign class, we use all available samples as targets and sample four times as many distractors, yielding a 1:4 target-to-distractor ratio.
36-
37-
For instance, for the sign _HOUSE_ with 40 samples (11 from ASL Citizen, 29 from Sem-Lex), we add 160 distractors and compute pairwise metrics from each target to all 199 other examples (We consistently discard scores for pose files where either the target or distractor could not be embedded with SignCLIP.).
38-
39-
Retrieval quality is measured using Mean Average Precision (`mAP↑`) and Precision@10 (`P@10↑`). The complete evaluation covers 5,362 unique sign classes and 82,099 pose sequences.
40-
41-
After several pilot runs, we finalized a subset of 169 sign classes with at most 20 samples each, ensuring consistent metric coverage. We evaluated 1200 distance-based variants and SignCLIP models with different checkpoints provided by the authors on this subset.
42-
43-
The overall results show that DTW-based metrics outperform padding-based baselines. Embedding-based methods, particularly SignCLIP models fine-tuned on in-domain ASL data, achieve the strongest retrieval scores.
16+
pip install git+https://github.com/sign-language-processing/pose-evaluation.git
17+
```
4418

45-
<!-- Atwell style evaluations didn't get done. Nor did AUTSL -->
19+
Create a metric:
20+
```python
21+
from pose_evaluation.metrics.distance_metric import DistanceMetric
22+
from pose_evaluation.metrics.dtw_metric import DTWDTAIImplementationDistanceMeasure
23+
from pose_evaluation.metrics.pose_processors import *
24+
25+
DTWp = DistanceMetric(
26+
name="DTWp",
27+
# Select distance measure
28+
distance_measure=DTWDTAIImplementationDistanceMeasure(),
29+
# Provide pose processing pipeline
30+
pose_preprocessors=[
31+
TrimMeaninglessFramesPoseProcessor(),
32+
GetHandsOnlyHolisticPoseProcessor(), # select only the hands
33+
FillMaskedOrInvalidValuesPoseProcessor(masked_fill_value=10.0), # fill masked values with 10.0
34+
ReducePosesToCommonComponentsProcessor() # reduce pairs of poses to common components
35+
],
36+
)
37+
```
4638

47-
## Evaluation Metrics
39+
Evaluate two pose sequences:
40+
```python
41+
from pose_format import Pose
4842

49-
For the study, we evaluated over 1200 Pose distance metrics, recording mAP and other retrieval performance characteristics.
43+
with open("hypothesis.pose", "rb") as f:
44+
hypothesis = Pose.read(f)
45+
46+
with open("reference.pose", "rb") as f:
47+
reference = Pose.read(f)
5048

51-
We find that the top metric
49+
DTWp.score(hypothesis, reference)
50+
```
5251

5352
### Contributing
5453

55-
Please make sure to run `black pose_evaluation` before submitting a pull request.
54+
Please make sure to run `make format` before submitting a pull request.
5655

5756
## Cite
5857

5958
If you use our toolkit in your research or projects, please consider citing the work.
6059

6160
```bib
62-
@misc{pose-evaluation2025,
63-
title={Meaningful Pose-Based Sign Language Evaluation},
64-
author={Zifan Jiang, Colin Leong, Amit Moryossef, Anne Göhring, Annette Rios, Oliver Cory, Maksym Ivashechkin, Neha Tarigopula, Biao Zhang, Rico Sennrich, Sarah Ebling},
65-
howpublished={\url{https://github.com/sign-language-processing/pose-evaluation}},
66-
year={2025}
61+
@inproceedings{Jiang2025PoseEvaluation,
62+
title={Meaningful Pose-Based Sign Language Evaluation},
63+
author={Zifan Jiang, Colin Leong, Amit Moryossef, Oliver Cory, Maksym Ivashechkin, Neha Tarigopula, Biao Zhang, Anne Göhring, Annette Rios, Rico Sennrich, Sarah Ebling},
64+
booktitle={Conference on Machine Translation},
65+
year={2025},
66+
url={https://github.com/sign-language-processing/pose-evaluation}
6767
}
6868
```
6969

assets/distribution/all.png

-25.4 KB
Binary file not shown.

pyproject.toml

Lines changed: 0 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -67,8 +67,6 @@ select = [
6767
"A", # flake8-builtins
6868
"T", # flake8-bandit
6969
"Q", # flake8-quotes
70-
# "ANN", # flake8-annotations
71-
# "ERA", # eradicate (commented out code)
7270
"RUF", # Ruff specific rules
7371
]
7472

@@ -109,15 +107,6 @@ known-first-party = ["pose_evaluation"]
109107
# lines-after-imports = 2
110108
# force-single-line = false
111109

112-
[tool.ruff.format]
113-
# This section configures Ruff's integrated formatter.
114-
# It should produce output highly compatible with Black.
115-
# No specific settings are usually needed here unless you have particular preferences
116-
# like quote-style (e.g., `quote-style = "single"` or `"double"`).
117-
# docstring-code-format = true # If you want Ruff to format code examples in docstrings
118-
# docstring-code-line-length = "dynamic" # or an integer like 88
119-
120-
121110
[tool.setuptools]
122111
packages = ["pose_evaluation", "pose_evaluation.metrics"]
123112

0 commit comments

Comments
 (0)