-
Notifications
You must be signed in to change notification settings - Fork 2
Signclip metric #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Signclip metric #5
Conversation
…ly but I can load in the .npy files
AmitMY
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left initial comments
|
Having had a chance to read https://github.com/UKPLab/sentence-transformers/blob/master/sentence_transformers/util.py#L31, I am halfway tempted to either copy large chunks wholesale, or add it as a dependency and use theirs. I'm going to at least cross-check using this. |
If you choose to use their stuff, please add it as a dependency. Do not just copy paste code :) It would mean we have less code here, less code to maintain, and we are using a robust implementation |
|
Added sentence-transformers as a dependency, and reworked to use it. Also updated tests, fixed various pylint things. Going to try running pytest again on another machine, see if the outputs look right. |
|
OK, pytests run right on the other machine, other than the test_score_different_length from test_distance_metric.py. Pylint mostly likes it, though it doesn't appreciate me using the protected functions from sentence-transformers. |
|
Code style suggestions from OpenAI: https://chatgpt.com/share/6750ae24-9b80-800e-82a7-ac888a21a615 I like the following:
|
|
All right, ready for the next round, I think. Today I:
|
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| main() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did not look at this file yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AmitMY I would appreciate some thoughts on that one and where to go with it. At the moment it basically just does ASL Citizen, I wanted to extend it to others such as SemLex
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, what this file does, is get distances specifically signclip for asl_citizen?
that is a start, but, it would be best if what we could do is:
Given a directory of poses in various classes, for example poses/class/X, can iterate over all of the metrics, and run them to calculate the k-nearest neighbor for each sample, for classification.
Then, once we have like 8 metrics, we can run them all, and see which one gives the best classification score (and that would be considered the best metric, for form based comparison, for single signs) - see https://github.com/sign-language-processing/signwriting-evaluation/blob/main/signwriting_evaluation/evaluation/closest_matches.py#L93-L107
Another example, would be to have a directory of poses poses-dgs/ where each pose has a .txt file associated with it. Let's assume 1000 sentences in german sign language and german.
Then, we can perform an all-to-all similarity between the poses, and an all-to-all similarity between the texts (using xCOMET for example) and perform a correlation study. whichever metric correlates best with xCOMET is the best metric for semantic sentence comparison.
What I am trying to say is: we develop a generic evaluation that tries to say something "for 1 dataset type, run all metrics, correlate with something"
and then we can perform this on many datasets and inform the reader about the best metrics.
Then, when someone comes and says "i developed a new metric" they run it on everything, like GLUE basically, and we can see the upsides and downsides.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes, I quite like the second idea, which corresponds to a recent reading:
https://phillipi.github.io/prh/
This way, even if the pose and text do not share the same latent space (unlike in SignCLIP), we can still use a similarity kernel to calculate the correlation. The alignment to text offers a generic way of semantic comparison, while the one I am working on with the human study is tailored for machine translation.
So in the end, we can evaluate a metric either given (1) a bunch of grouped poses or (2) a bunch of poses paired with text.
(Let's take it step by step, do not need to have them all in this PR!)
… dtype arg, rename set_device, etc
AmitMY
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
next, i will look into evaluate_signclip - doesn't have to be the same PR, that's why i reviewed in the order i did
…ance_matrix_shape_checker, removing redundant args documentation,
AmitMY
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am mostly happy with this PR. It can still be modified in the future, once we have more metrics etc. Thanks a lot for all the hard work!
No description provided.