Skip to content

Commit 0d0c4a6

Browse files
committed
Deploying to gh-pages from @ 38c5ca1 🚀
1 parent 4f18c7e commit 0d0c4a6

File tree

4 files changed

+83
-2
lines changed

4 files changed

+83
-2
lines changed

index.html

+29
Original file line numberDiff line numberDiff line change
@@ -399,6 +399,20 @@ <h4 id="text-to-notation">Text-to-Notation</h4>
399399
<hr />
400400
<h4 id="notation-to-pose">Notation-to-Pose</h4>
401401
<p><span class="citation" data-cites="shalev2022ham2pose">Arkushin, Moryossef, and Fried (<a href="#ref-shalev2022ham2pose" role="doc-biblioref">2023</a>)</span> proposed Ham2Pose, a model to animate HamNoSys into a sequence of poses. They first encode the HamNoSys into a meaningful “context” representation using a transform encoder, and use it to predict the length of the pose sequence to be generated. Then, starting from a still frame they used an iterative non-autoregressive decoder to gradually refine the sign over <span class="math inline"><em>T</em></span> steps, In each time step <span class="math inline"><em>t</em></span> from <span class="math inline"><em>T</em></span> to <span class="math inline">1</span>, the model predicts the required change from step <span class="math inline"><em>t</em></span> to step <span class="math inline"><em>t</em> − 1</span>. After <span class="math inline"><em>T</em></span> steps, the pose generator outputs the final pose sequence. Their model outperformed previous methods like <span class="citation" data-cites="saunders2020progressive">Saunders, Camgöz, and Bowden (<a href="#ref-saunders2020progressive" role="doc-biblioref">2020</a><a href="#ref-saunders2020progressive" role="doc-biblioref">b</a>)</span>, animating HamNoSys into more realistic sign language sequences.</p>
402+
<h4 id="evaluation-metrics">Evaluation Metrics</h4>
403+
<p>Methods for automatic evaluation of sign language processing are typically dependent only on the output and independent of the input.</p>
404+
<h5 id="text-output">Text output</h5>
405+
<p>For tasks that output spoken language text, standard machine translation metrics such as BLEU, chrF, or COMET are commonly used. <!-- <span style="background-color: red; color: white; padding: 0 2px !important;">**TODO**</span>: examples --></p>
406+
<h5 id="gloss-output">Gloss Output</h5>
407+
<p>Gloss outputs can be automatically scored as well, though not without issues. In particular, <span class="citation" data-cites="muller-etal-2023-considerations">Müller et al. (<a href="#ref-muller-etal-2023-considerations" role="doc-biblioref">2023</a>)</span> analysed this and provide a series of recommendations (see the section on “Glosses”, above).</p>
408+
<h5 id="pose-output">Pose Output</h5>
409+
<p>For translation from spoken languages to signed languages, automatic evaluation metrics are an open line of research, though some metrics involving back-translation have been developed (see Text-to-Pose and Notation-to-Pose, above). <!-- <span style="background-color: red; color: white; padding: 0 2px !important;">**TODO**</span>: "Progressive Transformers for End-to-End Sign Language Production" is the one cited in Towards Fast and High-Quality Sign Language Production as a "widely-used setting" for backtranslation. --> <!-- <span style="background-color: red; color: white; padding: 0 2px !important;">**TODO**</span>: Towards Fast and High-Quality Sign Language Production uses back-translation. Discuss results and issues. --></p>
410+
<!-- These three papers are cited in @shalev2022ham2pose as previous work using APE -->
411+
<p>Naively, works in this domain have used metrics such as Mean Squared Error (MSE) or Average Position Error (APE) for pose outputs [ahuja2019Language2PoseNaturalLanguage;ghosh2021SynthesisCompositionalAnimations;petrovich2022TEMOSGeneratingDiverse]. However, these metrics have significant limitations for Sign Language Production.</p>
412+
<p>For example, MSE and APE do not account for variations in sequence length. In practice, the same sign will not always take exactly the same amount of time to produce, even by the same signer. To address time variation, <span class="citation" data-cites="huang2021towards">Huang et al. (<a href="#ref-huang2021towards" role="doc-biblioref">2021</a>)</span> introduced a metric for pose sequence outputs based on measuring the distance between generated and reference pose sequences at the joint level using dynamic time warping, termed DTW-MJE (Dynamic Time Warping - Mean Joint Error). However, this metric did not clearly address how to handle missing keypoints. <span class="citation" data-cites="shalev2022ham2pose">Arkushin, Moryossef, and Fried (<a href="#ref-shalev2022ham2pose" role="doc-biblioref">2023</a>)</span> experimented with multiple evaluation methods, and proposed adding a distance function that accounts for these missing keypoints. They applied this function with normalization of keypoints, naming their metric nDTW-MJE. <!-- They don't explicitly explain that the lowercase n is for "normalized keypoints" but that's my guess. -Colin --></p>
413+
<h5 id="multi-channel-block-output">Multi-Channel Block output</h5>
414+
<p>As an alternative to gloss sequences, <span class="citation" data-cites="kim-etal-2024-signbleu-automatic">Kim et al. (<a href="#ref-kim-etal-2024-signbleu-automatic" role="doc-biblioref">2024</a>)</span> proposed a multi-channel output representation for sign languages and introduced SignBLEU, a BLEU-like scoring method for these outputs. Instead of a single linear sequence of glosses, the representation segments sign language output into multiple linear channels, each containing discrete “blocks”. These blocks represent both manual and non-manual signals, for example, one for each hand and others for various non-manual signals like eyebrow movements. The blocks are then converted to n-grams: temporal grams capture sequences within a channel, and channel grams capture co-occurrences across channels. The SignBLEU score is then calculated for these n-grams of varying orders. They evaluated SignBLEU on the DGS Corpus v3.0 <span class="citation" data-cites="dataset:Konrad_2020_dgscorpus_3 dataset:prillwitz2008dgs">(Konrad et al. <a href="#ref-dataset:Konrad_2020_dgscorpus_3" role="doc-biblioref">2020</a>; Prillwitz et al. <a href="#ref-dataset:prillwitz2008dgs" role="doc-biblioref">2008</a>)</span>, NIASL2021 <span class="citation" data-cites="dataset:huerta-enochian-etal-2022-kosign">(Huerta-Enochian et al. <a href="#ref-dataset:huerta-enochian-etal-2022-kosign" role="doc-biblioref">2022</a>)</span>, and NCSLGR <span class="citation" data-cites="dataset:Neidle_2020_NCSLGR_ISLRN Vogler2012ASLLRP_data_access_interface">(Neidle and Sclaroff <a href="#ref-dataset:Neidle_2020_NCSLGR_ISLRN" role="doc-biblioref">2012</a>; Vogler and Neidle <a href="#ref-Vogler2012ASLLRP_data_access_interface" role="doc-biblioref">2012</a>)</span> datasets, comparing it with single-channel (gloss) metrics such as BLEU, TER, chrF, and METEOR, as well as human evaluations by native signers. The authors found that SignBLEU consistently correlated better to human evaluation than these alternatives. However, one limitation of this approach is the lack of suitable datasets. The authors reviewed a number of sign language corpora, noting the relative scarcity of multi-channel annotations. The <a href="https://github.com/eq4all-projects/SignBLEU">source code for SignBLEU</a> is available. As with SacreBLEU <span class="citation" data-cites="post-2018-call-sacrebleu">(Post <a href="#ref-post-2018-call-sacrebleu" role="doc-biblioref">2018</a>)</span>, the code can generate “version signature” strings summarizing key parameters, to enhance reproducibility.</p>
415+
<!-- (and SignBLEU can be installed and run! https://colab.research.google.com/drive/1mRCSBQSvjkoSOz5MFiOko1CgtamuCVYO?usp=sharing) -->
402416
<h3 id="sign-language-retrieval">Sign Language Retrieval</h3>
403417
<p>Sign Language Retrieval is the task of finding a particular data item, given some input. In contrast to translation, generation or production tasks, there can exist a correct corresponding piece of data already, and the task is to find it out of many, if it exists. Metrics used include retrieval at Rank K (R@K, higher is better) and median rank (MedR, lower is better).</p>
404418
<!-- <span style="background-color: red; color: white; padding: 0 2px !important;">**TODO**</span>: text-to-sign-video (T2V) section, sign-video-to-text (V2T) retrieval? -->
@@ -1346,6 +1360,9 @@ <h2 id="references">References</h2>
13461360
<div id="ref-huang2021towards">
13471361
<p>Huang, Wencan, Wenwen Pan, Zhou Zhao, and Qi Tian. 2021. “Towards Fast and High-Quality Sign Language Production.” In <em>Proceedings of the 29th Acm International Conference on Multimedia</em>, 3172–81.</p>
13481362
</div>
1363+
<div id="ref-dataset:huerta-enochian-etal-2022-kosign">
1364+
<p>Huerta-Enochian, Mathew, Du Hui Lee, Hye Jin Myung, Kang Suk Byun, and Jun Woo Lee. 2022. “KoSign Sign Language Translation Project: Introducing the NIASL2021 Dataset.” In <em>Proceedings of the 7th International Workshop on Sign Language Translation and Avatar Technology: The Junction of the Visual and the Textual: Challenges and Perspectives</em>, 59–66. Marseille, France: European Language Resources Association. <a href="https://aclanthology.org/2022.sltat-1.9">https://aclanthology.org/2022.sltat-1.9</a>.</p>
1365+
</div>
13491366
<div id="ref-humphries2016avoiding">
13501367
<p>Humphries, Tom, Poorna Kushalnagar, Gaurav Mathur, Donna Jo Napoli, Carol Padden, Christian Rathmann, and Scott Smith. 2016. “Avoiding Linguistic Neglect of Deaf Children.” <em>Social Service Review</em> 90 (4): 589–619.</p>
13511368
</div>
@@ -1403,6 +1420,9 @@ <h2 id="references">References</h2>
14031420
<div id="ref-kezar2023improving">
14041421
<p>Kezar, Lee, Jesse Thomason, and Zed Sehyr. 2023. “Improving Sign Recognition with Phonology.” In <em>Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics</em>, 2732–7. Dubrovnik, Croatia: Association for Computational Linguistics. <a href="https://aclanthology.org/2023.eacl-main.200">https://aclanthology.org/2023.eacl-main.200</a>.</p>
14051422
</div>
1423+
<div id="ref-kim-etal-2024-signbleu-automatic">
1424+
<p>Kim, Jung-Ho, Mathew John Huerta-Enochian, Changyong Ko, and Du Hui Lee. 2024. “SignBLEU: Automatic Evaluation of Multi-Channel Sign Language Translation.” In <em>Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (Lrec-Coling 2024)</em>, 14796–14811. Torino, Italia: ELRA; ICCL. <a href="https://aclanthology.org/2024.lrec-main.1289">https://aclanthology.org/2024.lrec-main.1289</a>.</p>
1425+
</div>
14061426
<div id="ref-kimmelman2014information">
14071427
<p>Kimmelman, Vadim. 2014. “Information Structure in Russian Sign Language and Sign Language of the Netherlands.” <em>Sign Language &amp; Linguistics</em> 18 (1): 142–50.</p>
14081428
</div>
@@ -1421,6 +1441,9 @@ <h2 id="references">References</h2>
14211441
<div id="ref-koller2015ContinuousSLR">
14221442
<p>Koller, Oscar, Jens Forster, and Hermann Ney. 2015. “Continuous Sign Language Recognition: Towards Large Vocabulary Statistical Recognition Systems Handling Multiple Signers.” <em>Computer Vision and Image Understanding</em> 141: 108–25. <a href="https://doi.org/https://doi.org/10.1016/j.cviu.2015.09.013">https://doi.org/https://doi.org/10.1016/j.cviu.2015.09.013</a>.</p>
14231443
</div>
1444+
<div id="ref-dataset:Konrad_2020_dgscorpus_3">
1445+
<p>Konrad, Reiner, Thomas Hanke, Gabriele Langer, Dolly Blanck, Julian Bleicken, Ilona Hofmann, Olga Jeziorski, et al. 2020. “MEINE DGS – Annotiert. Öffentliches Korpus Der Deutschen Gebärdensprache, 3. Release / MY DGS – Annotated. Public Corpus of German Sign Language, 3rd Release.” Languageresource. Universität Hamburg. <a href="https://doi.org/10.25592/dgs.corpus-3.0">https://doi.org/10.25592/dgs.corpus-3.0</a>.</p>
1446+
</div>
14241447
<div id="ref-konrad2018public">
14251448
<p>Konrad, Reiner, Thomas Hanke, Gabriele Langer, Susanne König, Lutz König, Rie Nishio, and Anja Regen. 2018. “Public DGS Corpus: Annotation Conventions.” Project Note AP03-2018-01, DGS-Korpus project, IDGS, Hamburg University.</p>
14261449
</div>
@@ -1538,6 +1561,9 @@ <h2 id="references">References</h2>
15381561
<div id="ref-napier-leeson-2016">
15391562
<p>Napier, Jemina, and Lorraine Leeson. 2016. <em>Sign Language in Action</em>. London: Palgrave Macmillan.</p>
15401563
</div>
1564+
<div id="ref-dataset:Neidle_2020_NCSLGR_ISLRN">
1565+
<p>Neidle, Carol, and Stan Sclaroff. 2012. “National Center for Sign Language and Gesture Resources (Ncslgr) Corpus. ISLRN 833-505-711-564-4.” Languageresource. Boston University. <a href="https://www.islrn.org/resources/833-505-711-564-4/">https://www.islrn.org/resources/833-505-711-564-4/</a>.</p>
1566+
</div>
15411567
<div id="ref-neidle2001signstream">
15421568
<p>Neidle, Carol, Stan Sclaroff, and Vassilis Athitsos. 2001. “SignStream: A Tool for Linguistic and Computer Vision Research on Visual-Gestural Language Data.” <em>Behavior Research Methods, Instruments, &amp; Computers</em> 33 (3): 311–20.</p>
15431569
</div>
@@ -1763,6 +1789,9 @@ <h2 id="references">References</h2>
17631789
<div id="ref-vogler2005analysis">
17641790
<p>Vogler, Christian, and Siome Goldenstein. 2005. “Analysis of Facial Expressions in American Sign Language.” In <em>Proc, of the 3rd Int. Conf. On Universal Access in Human-Computer Interaction, Springer</em>.</p>
17651791
</div>
1792+
<div id="ref-Vogler2012ASLLRP_data_access_interface">
1793+
<p>Vogler, Christian, and C. Neidle. 2012. “A New Web Interface to Facilitate Access to Corpora: Development of the ASLLRP Data Access Interface.” In. <a href="https://api.semanticscholar.org/CorpusID:58305327">https://api.semanticscholar.org/CorpusID:58305327</a>.</p>
1794+
</div>
17661795
<div id="ref-dataset:von2007towards">
17671796
<p>Von Agris, Ulrich, and Karl-Friedrich Kraiss. 2007. “Towards a Video Corpus for Signer-Independent Continuous Sign Language Recognition.” <em>Gesture in Human-Computer Interaction and Simulation, Lisbon, Portugal, May</em> 11.</p>
17681797
</div>

index.md

+49-1
Original file line numberDiff line numberDiff line change
@@ -841,6 +841,7 @@ They apply several low-resource machine translation techniques used to improve s
841841
Their findings validate the use of an intermediate text representation for signed language translation, and pave the way for including sign language translation in natural language processing research.
842842

843843
#### Text-to-Notation
844+
844845
@jiang2022machine also explore the reverse translation direction, i.e., text to SignWriting translation.
845846
They conduct experiments under a same condition of their multilingual SignWriting to text (4 language pairs) experiment, and again propose a neural factored machine translation approach to decode the graphemes and their position separately.
846847
They borrow BLEU from spoken language translation to evaluate the predicted graphemes and mean absolute error to evaluate the positional numbers.
@@ -850,14 +851,61 @@ They borrow BLEU from spoken language translation to evaluate the predicted grap
850851
---
851852

852853
#### Notation-to-Pose
854+
853855
@shalev2022ham2pose proposed Ham2Pose, a model to animate HamNoSys into a sequence of poses.
854856
They first encode the HamNoSys into a meaningful "context" representation using a transform encoder,
855857
and use it to predict the length of the pose sequence to be generated.
856858
Then, starting from a still frame they used an iterative non-autoregressive decoder to gradually refine the sign over $T$ steps,
857859
In each time step $t$ from $T$ to $1$, the model predicts the required change from step $t$ to step $t-1$. After $T$ steps, the pose generator outputs the final pose sequence.
858-
Their model outperformed previous methods like @saunders2020progressive, animating HamNoSys into more realistic sign language sequences.
860+
Their model outperformed previous methods like @saunders2020progressive, animating HamNoSys into more realistic sign language sequences.
861+
862+
#### Evaluation Metrics
863+
864+
Methods for automatic evaluation of sign language processing are typically dependent only on the output and independent of the input.
865+
866+
##### Text output
867+
868+
For tasks that output spoken language text, standard machine translation metrics such as BLEU, chrF, or COMET are commonly used.
869+
<!-- <span style="background-color: red; color: white; padding: 0 2px !important;">**TODO**</span>: examples -->
870+
871+
##### Gloss Output
872+
873+
Gloss outputs can be automatically scored as well, though not without issues.
874+
In particular, @muller-etal-2023-considerations analysed this and provide a series of recommendations (see the section on "Glosses", above).
875+
876+
##### Pose Output
877+
878+
For translation from spoken languages to signed languages, automatic evaluation metrics are an open line of research, though some metrics involving back-translation have been developed (see Text-to-Pose and Notation-to-Pose, above).
879+
<!-- <span style="background-color: red; color: white; padding: 0 2px !important;">**TODO**</span>: "Progressive Transformers for End-to-End Sign Language Production" is the one cited in Towards Fast and High-Quality Sign Language Production as a "widely-used setting" for backtranslation. -->
880+
<!-- <span style="background-color: red; color: white; padding: 0 2px !important;">**TODO**</span>: Towards Fast and High-Quality Sign Language Production uses back-translation. Discuss results and issues. -->
881+
882+
<!-- These three papers are cited in @shalev2022ham2pose as previous work using APE -->
883+
Naively, works in this domain have used metrics such as Mean Squared Error (MSE) or Average Position Error (APE) for pose outputs [ahuja2019Language2PoseNaturalLanguage;ghosh2021SynthesisCompositionalAnimations;petrovich2022TEMOSGeneratingDiverse].
884+
However, these metrics have significant limitations for Sign Language Production.
885+
886+
For example, MSE and APE do not account for variations in sequence length.
887+
In practice, the same sign will not always take exactly the same amount of time to produce, even by the same signer.
888+
To address time variation, @huang2021towards introduced a metric for pose sequence outputs based on measuring the distance between generated and reference pose sequences at the joint level using dynamic time warping, termed DTW-MJE (Dynamic Time Warping - Mean Joint Error).
889+
However, this metric did not clearly address how to handle missing keypoints.
890+
@shalev2022ham2pose experimented with multiple evaluation methods, and proposed adding a distance function that accounts for these missing keypoints.
891+
They applied this function with normalization of keypoints, naming their metric nDTW-MJE.
892+
<!-- They don't explicitly explain that the lowercase n is for "normalized keypoints" but that's my guess. -Colin -->
893+
894+
##### Multi-Channel Block output
859895

896+
As an alternative to gloss sequences, @kim-etal-2024-signbleu-automatic proposed a multi-channel output representation for sign languages and introduced SignBLEU, a BLEU-like scoring method for these outputs.
897+
Instead of a single linear sequence of glosses, the representation segments sign language output into multiple linear channels, each containing discrete "blocks".
898+
These blocks represent both manual and non-manual signals, for example, one for each hand and others for various non-manual signals like eyebrow movements.
899+
The blocks are then converted to n-grams: temporal grams capture sequences within a channel, and channel grams capture co-occurrences across channels.
900+
The SignBLEU score is then calculated for these n-grams of varying orders.
901+
They evaluated SignBLEU on the DGS Corpus v3.0 [@dataset:Konrad_2020_dgscorpus_3; @dataset:prillwitz2008dgs], NIASL2021 [@dataset:huerta-enochian-etal-2022-kosign], and NCSLGR [@dataset:Neidle_2020_NCSLGR_ISLRN; @Vogler2012ASLLRP_data_access_interface] datasets, comparing it with single-channel (gloss) metrics such as BLEU, TER, chrF, and METEOR, as well as human evaluations by native signers.
902+
The authors found that SignBLEU consistently correlated better to human evaluation than these alternatives.
903+
However, one limitation of this approach is the lack of suitable datasets.
904+
The authors reviewed a number of sign language corpora, noting the relative scarcity of multi-channel annotations.
905+
The [source code for SignBLEU](https://github.com/eq4all-projects/SignBLEU) is available.
906+
As with SacreBLEU [@post-2018-call-sacrebleu], the code can generate "version signature" strings summarizing key parameters, to enhance reproducibility.
860907

908+
<!-- (and SignBLEU can be installed and run! https://colab.research.google.com/drive/1mRCSBQSvjkoSOz5MFiOko1CgtamuCVYO?usp=sharing) -->
861909

862910
```{=ignore}
863911
#### Pose-to-Notation

sitemap.xml

+1-1
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88

99
<url>
1010
<loc>https://sign-language-processing.github.io/</loc>
11-
<lastmod>2024-06-20T19:58:31+00:00</lastmod>
11+
<lastmod>2024-06-21T08:01:16+00:00</lastmod>
1212
</url>
1313

1414

0 commit comments

Comments
 (0)