-
Notifications
You must be signed in to change notification settings - Fork 1
Add extended pipeline #11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@yasamanparhizkar This should be ready for merge when I clean up the scripts we don't need. |
afkanpour
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @Adibvafa and @yasamanparhizkar)
We don't need the scripts in There's another |
|
Previously, afkanpour (Arash) wrote…
This file can be deleted. |
|
Previously, afkanpour (Arash) wrote…
File can be deleted. |
|
Previously, afkanpour (Arash) wrote…
File can be deleted. |
|
@yasamanparhizkar done, removed the utils directory. |
|
@afkanpour Please review the README file and let me know if anything is ambigous! |
afkanpour
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few _original files in this PR. I assume those are the original files before your changes. Why should we keep them around? Are we keeping them because you suspect you might have made a mistake in your changes? or do you think the originals might be needed at some point in the future? If the latter, please note that this is our pipeline now. So for the sake of cleaner code and less clutter we should not keep the originals.
Reviewed 1 of 8 files at r7, 9 of 11 files at r8, all commit messages.
Reviewable status: 101 of 110 files reviewed, 12 unresolved discussions (waiting on @Adibvafa and @yasamanparhizkar)
openpmcvl/granular/pipeline/align.sh line 0 at r8 (raw file):
For this and other shell scripts add a one line description followed by a sample command on top of the script.
openpmcvl/granular/pipeline/align_original.sh line 0 at r8 (raw file):
Should this be in the PR? How's it different from align.sh? If it's needed, please add a description and a sample command.
openpmcvl/granular/pipeline/align_original.sh line 12 at r8 (raw file):
# Activate the environment source /h/afallah/light/bin/activate
Please do not use personal paths.
Code quote:
/h/afallah/light/bin/activateopenpmcvl/granular/pipeline/align_original.sh line 15 at r8 (raw file):
# Set the working directory cd /h/afallah/pmc-data-extraction
ditto
Code quote:
/h/afallah/pmc-data-extractionopenpmcvl/granular/pipeline/align_original.sh line 27 at r8 (raw file):
# Define the root directory root_dir="/projects/multimodal/datasets/pmc_oa"
There shouldn't be pointers to our directories if we want to open-source this. If there are similar instances elsewhere, please change them all.
Code quote:
/projects/multimodal/datasets/pmc_oaopenpmcvl/granular/pipeline/align_original.sh line 31 at r8 (raw file):
# Define the input and output files input_file="${root_dir}/pmc_oa.jsonl" output_file="${root_dir}/pmc_oa_labeled/pmc_oa_aligned_${BEGIN_IDX}_${END_IDX}.jsonl"
Is this directory created in align.py if it doesn't already exist or does the python script throw an error?
If the latter, we should make sure the directory is created before running align.py.
Code quote:
${root_dir}/pmc_oa_labeled/pmc_oa_aligned_${BEGIN_IDX}_${END_IDX}.jsonlopenpmcvl/granular/README.md line 0 at r8 (raw file):
For running each step please provide a sample command.
openpmcvl/granular/pipeline/preprocess_original.sh line 0 at r8 (raw file):
Same comment as above re _original files.
openpmcvl/granular/pipeline/classify_original.sh line 0 at r8 (raw file):
What does "_original" refer to? The original script before your changes? If so, why do we need to keep the originals?
|
@afkanpour oh sorry the original files werent supposed to be committed, slipped my hand, removed them. |
Adibvafa
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 92 of 110 files reviewed, 12 unresolved discussions (waiting on @afkanpour and @yasamanparhizkar)
openpmcvl/granular/README.md line at r8 (raw file):
Previously, afkanpour (Arash) wrote…
For running each step please provide a sample command.
Done.
openpmcvl/granular/pipeline/align.sh line at r8 (raw file):
Previously, afkanpour (Arash) wrote…
For this and other shell scripts add a one line description followed by a sample command on top of the script.
Done.
openpmcvl/classification/classifier.sh line at r2 (raw file):
Previously, afkanpour (Arash) wrote…
Do we use this script?
Done.
openpmcvl/granular/utils/train_test_split.py line 123 at r2 (raw file):
Previously, afkanpour (Arash) wrote…
Everything is commented out here. Please fix.
Done.
Adibvafa
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 92 of 110 files reviewed, 12 unresolved discussions (waiting on @afkanpour and @yasamanparhizkar)
Adibvafa
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 92 files at r1, 75 of 91 files at r2, 3 of 3 files at r3, 8 of 10 files at r5, 8 of 8 files at r7, 2 of 11 files at r8, 5 of 5 files at r9, 1 of 6 files at r10, 5 of 5 files at r11, all commit messages.
Reviewable status: all files reviewed, 12 unresolved discussions (waiting on @afkanpour and @yasamanparhizkar)
openpmcvl/granular/pipeline/align_original.sh line 12 at r8 (raw file):
Previously, afkanpour (Arash) wrote…
Please do not use personal paths.
Done.
openpmcvl/granular/pipeline/align_original.sh line 15 at r8 (raw file):
Previously, afkanpour (Arash) wrote…
ditto
Done.
openpmcvl/granular/pipeline/align_original.sh line 27 at r8 (raw file):
Previously, afkanpour (Arash) wrote…
There shouldn't be pointers to our directories if we want to open-source this. If there are similar instances elsewhere, please change them all.
Done.
openpmcvl/granular/pipeline/align_original.sh line 31 at r8 (raw file):
Previously, afkanpour (Arash) wrote…
Is this directory created in align.py if it doesn't already exist or does the python script throw an error?
If the latter, we should make sure the directory is created before running align.py.
Done.
openpmcvl/granular/pipeline/align_original.sh line at r8 (raw file):
Previously, afkanpour (Arash) wrote…
Should this be in the PR? How's it different from align.sh? If it's needed, please add a description and a sample command.
Done.
openpmcvl/granular/pipeline/classify_original.sh line at r8 (raw file):
Previously, afkanpour (Arash) wrote…
What does "_original" refer to? The original script before your changes? If so, why do we need to keep the originals?
Done.
openpmcvl/granular/pipeline/preprocess_original.sh line at r8 (raw file):
Previously, afkanpour (Arash) wrote…
Same comment as above re _original files.
Done.
openpmcvl/granular/utils/clean_by_format.py line 61 at r2 (raw file):
Previously, yasamanparhizkar (Yasaman P.) wrote…
This file can be deleted.
Done.
openpmcvl/granular/utils/dir_or_file.py line 4 at r2 (raw file):
Previously, yasamanparhizkar (Yasaman P.) wrote…
File can be deleted.
Done.
openpmcvl/granular/utils/train_test_split.py line 123 at r2 (raw file):
Previously, Adibvafa (Adibvafa Fallahpour) wrote…
Done.
Done.
Adibvafa
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 12 unresolved discussions (waiting on @afkanpour and @yasamanparhizkar)
|
@afkanpour should be good to merge. |
afkanpour
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 14 of 14 files at r12, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @yasamanparhizkar)
PR Type
Feature
Short Description
Add pipeline to create finegrained dataset
Tests Added
None
This change is