Skip to content

Conversation

@cleong110
Copy link
Contributor

No description provided.

Copy link
Contributor

@AmitMY AmitMY left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left initial comments

@cleong110 cleong110 mentioned this pull request Dec 4, 2024
@cleong110
Copy link
Contributor Author

cleong110 commented Dec 4, 2024

Having had a chance to read https://github.com/UKPLab/sentence-transformers/blob/master/sentence_transformers/util.py#L31, I am halfway tempted to either copy large chunks wholesale, or add it as a dependency and use theirs.

I'm going to at least cross-check using this.

@AmitMY
Copy link
Contributor

AmitMY commented Dec 4, 2024

Having had a chance to read https://github.com/UKPLab/sentence-transformers/blob/master/sentence_transformers/util.py#L31, I am halfway tempted to either copy large chunks wholesale, or add it as a dependency and use theirs.

I'm going to at least cross-check using this.

If you choose to use their stuff, please add it as a dependency. Do not just copy paste code :) It would mean we have less code here, less code to maintain, and we are using a robust implementation

@cleong110
Copy link
Contributor Author

Added sentence-transformers as a dependency, and reworked to use it. Also updated tests, fixed various pylint things. Going to try running pytest again on another machine, see if the outputs look right.

@cleong110 cleong110 marked this pull request as ready for review December 4, 2024 19:26
@cleong110
Copy link
Contributor Author

OK, pytests run right on the other machine, other than the test_score_different_length from test_distance_metric.py.

Pylint mostly likes it, though it doesn't appreciate me using the protected functions from sentence-transformers.

@cleong110
Copy link
Contributor Author

cleong110 commented Dec 4, 2024

Code style suggestions from OpenAI: https://chatgpt.com/share/6750ae24-9b80-800e-82a7-ac888a21a615

I like the following:

  • Consistent List instead of list
  • Better Docstrings, yeah
  • Dispatch table
  • use logging module
  • better error messages

@cleong110
Copy link
Contributor Author

All right, ready for the next round, I think. Today I:

  • added some checks for list of ndarray, list of Tensor
  • current version uses a dispatch dictionary to automatically pick the correct function and call it, rather than a long string if if/elif
  • Added a few more tests to deal with List inputs



if __name__ == "__main__":
main()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did not look at this file yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AmitMY I would appreciate some thoughts on that one and where to go with it. At the moment it basically just does ASL Citizen, I wanted to extend it to others such as SemLex

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, what this file does, is get distances specifically signclip for asl_citizen?
that is a start, but, it would be best if what we could do is:

Given a directory of poses in various classes, for example poses/class/X, can iterate over all of the metrics, and run them to calculate the k-nearest neighbor for each sample, for classification.
Then, once we have like 8 metrics, we can run them all, and see which one gives the best classification score (and that would be considered the best metric, for form based comparison, for single signs) - see https://github.com/sign-language-processing/signwriting-evaluation/blob/main/signwriting_evaluation/evaluation/closest_matches.py#L93-L107

Another example, would be to have a directory of poses poses-dgs/ where each pose has a .txt file associated with it. Let's assume 1000 sentences in german sign language and german.
Then, we can perform an all-to-all similarity between the poses, and an all-to-all similarity between the texts (using xCOMET for example) and perform a correlation study. whichever metric correlates best with xCOMET is the best metric for semantic sentence comparison.

What I am trying to say is: we develop a generic evaluation that tries to say something "for 1 dataset type, run all metrics, correlate with something"

and then we can perform this on many datasets and inform the reader about the best metrics.

Then, when someone comes and says "i developed a new metric" they run it on everything, like GLUE basically, and we can see the upsides and downsides.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, I quite like the second idea, which corresponds to a recent reading:

https://phillipi.github.io/prh/

This way, even if the pose and text do not share the same latent space (unlike in SignCLIP), we can still use a similarity kernel to calculate the correlation. The alignment to text offers a generic way of semantic comparison, while the one I am working on with the human study is tailored for machine translation.

So in the end, we can evaluate a metric either given (1) a bunch of grouped poses or (2) a bunch of poses paired with text.

(Let's take it step by step, do not need to have them all in this PR!)

Copy link
Contributor

@AmitMY AmitMY left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

next, i will look into evaluate_signclip - doesn't have to be the same PR, that's why i reviewed in the order i did

Copy link
Contributor

@AmitMY AmitMY left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am mostly happy with this PR. It can still be modified in the future, once we have more metrics etc. Thanks a lot for all the hard work!

@AmitMY AmitMY merged commit d5b7dc2 into sign-language-processing:main Dec 10, 2024
0 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants