Skip to content

Datasets to dataframes, run scoring, analyze scores #30

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 80 commits into
base: main
Choose a base branch
from

Conversation

cleong110
Copy link
Contributor

@cleong110 cleong110 commented Apr 16, 2025

Code to:

  1. Collect multiple datasets together to a common DataFrame/CSV format, with GLOSS, POSE_FILE_PATH, SPLIT and unique VIDEO_ID columns
  2. Parser scripts for ASL Citizen, Sem-Lex, and PopSign ASL to the common csv format
  3. Load in splits all three datasets, e.g. all the train/val or just the test sets
  4. Construct metrics automatically by generating combinations of Distance Measure, keypoint selection, sequence alignment, etc. Resulting in dozens of metrics
  5. Run "in-Gloss+4x Outgloss"
  6. Save the results to specified folder as csvs.

Example usage:

clone and setup

# clone the repo, and checkout this branch
# cd into the repo
conda create -n pose_eval_src pip
conda activate pose_eval_src
# which pip should show the pip inside the env
which pip

# install editable
pip install -e -U .

Then generate csv files

python pose_evaluation/evaluation/dataset_parsing/popsign_to_df.py ~/data/PopSignASL/ --out ~/projects/pose-evaluation/dataset_dfs/popsign_asl.csv
python pose_evaluation/evaluation/dataset_parsing/sem_lex_to_dataframe.py ~/data/Sem-Lex/ --out dataset_dfs/semlex.csv
python pose_evaluation/evaluation/dataset_parsing/asl_citizen_to_dataframe.py ~/data/ASL_Citizen/ --pose-files-path ~/data/ASL_Citizen/poses/ --metadata-path ~/data/ASL_Citizen/splits/ --out dataset_dfs/asl-citizen.csv 

Note that the popsign ASL one can optionally "translate" some but not all of the glosses if given a path to the ASL Knowledge graph --asl-knowledge-graph-path, see #28

python pose_evaluation/evaluation/dataset_parsing/popsign_to_df.py ~/data/PopSignASL/ --out dataset_dfs/popsign_asl.csv --asl-knowledge-graph-path ~/data/ASLKG/edges_v2_noweights.tsv

Then load them and run metrics

python pose_evaluation/evaluation/load_splits_and_run_metrics.py dataset_dfs/*.csv

# usage instructions
python pose_evaluation/evaluation/load_splits_and_run_metrics.py --help

analysis (TODO)

Script that will load all the score csv files and run analysis

@cleong110
Copy link
Contributor Author

Something else I'm realizing as I construct metrics is that it would be nice if some of them automatically populated their own Preprocessors based on the DistanceMeasure. Like, DTW metrics do not need a Sequence Alignment processor such as ZeroPadShorter

Similarly, dtai distance needs a strategy for dealing with nan (masked) values. Or we get nan trajectory distances, which become nan distances when aggregated. So when one instantiates a metric one NEEDs a masked value preprocessor, or a strategy for dealing with them, e.g. returning a default distance if the trajectory distance is nan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant