-
Notifications
You must be signed in to change notification settings - Fork 4
feat: implement a new validate
command
#220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
ayimany
wants to merge
49
commits into
main
Choose a base branch
from
impl-validate-command
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 22 commits
Commits
Show all changes
49 commits
Select commit
Hold shift + click to select a range
2b9f708
Create base for hapfile validation
6bd77fc
Solidify and improve validator base
79698fc
Raise error on type IDs which match chromosome IDs
ffc7935
Report errors for column additions on non-existent types
beabc98
Recognize extra columns and cast validation for extra column types
6b45589
Corrected bug where float values were unrecognized
2233b09
Allow parsing & reordering of extra columns
0cb6586
Append feature to cli
ba5f9d8
Fix bug where the validator would break if no repeats were provided
af7dbb5
Complete first working instance of the validator.
4c42e69
Add a pair of test files to the hapfile directory. Corrected a hapfile.
2693470
Format files with Black
79a845b
Create test for the validate command
76176c5
Create tests for validation command
c6f1f56
Remove debugging print statements
85a298b
fix pgenlib import issue
aryarm 777114e
Add doc base for the valhap command
81d38f8
Merge branch 'impl-validate-command' of github.com:CAST-genomics/hapt…
747b43b
Clean up docs. Add further information.
f28b902
Fix indentation
dbe6d87
Fix format.
32468f9
rename from val_hapfile to to 'validate'
aryarm 390eaeb
implement some suggestions from PR
aryarm 8b324ac
Use relative import for logging module
aryarm 57c81f8
accept pvar instead of pgen
aryarm 61ac08c
change up logging to be silent by default when called from command line
aryarm c4ecaec
reformat test_validate.py for concision
aryarm e7efcf6
Merge branch 'main' into impl-validate-command
aryarm 6bbee4b
rename test data dir and remove valhap prefix
aryarm 4b95834
remove test code import prefix
aryarm 1290c7b
Merge branch 'impl-validate-command' of github.com:CAST-genomics/hapt…
aryarm 5614004
add tests for command line and add non zero exit code
aryarm 474f9fc
clarify how sorting works
aryarm 6b7942c
change behavior of sorting parameter
aryarm d16f7bd
do not skip pytest for pgenlib
aryarm 9234bef
Merge branch 'main' into impl-validate-command
aryarm fc71adf
refmt with black
aryarm 04ab0e3
Merge branch 'main' into impl-validate-command
aryarm 6065862
remove extra files outside of test dir
aryarm 50d5cb3
rename valhap test dir to validate
aryarm 46ac080
add descriptions to all test commands
aryarm 3db4522
fail validation if any lines are blank
aryarm 6288b8d
add test for whitespace
aryarm 0b0932c
add test for indexed hap file
aryarm 6d81e26
start adding docstrings
aryarm 189eed0
remove max_variants which we will instead infer from the hap file
aryarm c042b82
start HapFileValidator class commenting
aryarm 3558764
add more comments to validate command
aryarm d91b2a3
document metadata line handling code
aryarm File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,125 @@ | ||
.. _commands-validate: | ||
|
||
|
||
validate | ||
======== | ||
|
||
Validate the structure of a ``.hap`` file. | ||
|
||
When a ``.hap`` file contains any errors, they will be logged accordingly. | ||
|
||
Optionally, the haplotypes present in the ``.hap`` file can be compared against a ``.pgen`` file. | ||
|
||
Usage | ||
~~~~~ | ||
.. code-block:: bash | ||
|
||
haptools validate \ | ||
--sort \ | ||
--genotypes PATH \ | ||
--verbosity [CRITICAL|ERROR|WARNING|INFO|DEBUG|NOTSET] \ | ||
HAPFILE | ||
|
||
Examples | ||
~~~~~~~~ | ||
.. code-block:: bash | ||
|
||
haptools validate tests/data/hapfiles/basic.hap | ||
|
||
Outputs a message specifying the amount of errors and warnings. | ||
|
||
.. code-block:: | ||
|
||
[ INFO] Completed HapFile validation with 0 errors and 0 warnings. | ||
|
||
All warnings and errors will be logged if there are any. | ||
|
||
.. code-block:: bash | ||
|
||
haptools validate tests/data/hapfiles/valhap_with_no_version.hap | ||
|
||
.. code-block:: | ||
|
||
[ WARNING] No version declaration found. Assuming to use the latest version. | ||
[ INFO] Completed HapFile validation with 0 errors and 1 warnings. | ||
[ WARNING] Found several warnings and / or errors in the hapfile | ||
|
||
One can use ``--no-sort`` to avoid sorting the file. | ||
This will make it so that all unordered files will get removed, such as out-of-header lines with meta information. | ||
|
||
.. code-block:: bash | ||
|
||
haptools validate --no-sort tests/data/hapfiles/valhap_with_out_of_header_metas.hap | ||
|
||
Will turn: | ||
|
||
.. code-block:: | ||
|
||
# orderH ancestry beta | ||
# version 0.2.0 | ||
#H ancestry s Local ancestry | ||
#H beta .2f Effect size in linear model | ||
#R beta .2f Effect size in linear model | ||
H 21 26928472 26941960 chr21.q.3365*1 ASW 0.73 | ||
R 21 26938353 26938400 21_26938353_STR 0.45 | ||
H 21 26938989 26941960 chr21.q.3365*10 CEU 0.30 | ||
H 21 26938353 26938989 chr21.q.3365*11 MXL 0.49 | ||
# This should cause an error if the file is sorted | ||
#V test_field s A field to test with | ||
V chr21.q.3365*1 26928472 26928472 21_26928472_C_A C | ||
V chr21.q.3365*1 26938353 26938353 21_26938353_T_C T | ||
V chr21.q.3365*1 26940815 26940815 21_26940815_T_C C | ||
V chr21.q.3365*1 26941960 26941960 21_26941960_A_G G | ||
V chr21.q.3365*10 26938989 26938989 21_26938989_G_A A | ||
V chr21.q.3365*10 26940815 26940815 21_26940815_T_C T | ||
V chr21.q.3365*10 26941960 26941960 21_26941960_A_G A | ||
V chr21.q.3365*11 26938353 26938353 21_26938353_T_C T | ||
V chr21.q.3365*11 26938989 26938989 21_26938989_G_A A | ||
|
||
Into | ||
|
||
.. code-block:: | ||
|
||
# orderH ancestry beta | ||
# version 0.2.0 | ||
#H ancestry s Local ancestry | ||
#H beta .2f Effect size in linear model | ||
#R beta .2f Effect size in linear model | ||
H 21 26928472 26941960 chr21.q.3365*1 ASW 0.73 | ||
R 21 26938353 26938400 21_26938353_STR 0.45 | ||
H 21 26938989 26941960 chr21.q.3365*10 CEU 0.30 | ||
H 21 26938353 26938989 chr21.q.3365*11 MXL 0.49 | ||
V chr21.q.3365*1 26928472 26928472 21_26928472_C_A C | ||
V chr21.q.3365*1 26938353 26938353 21_26938353_T_C T | ||
V chr21.q.3365*1 26940815 26940815 21_26940815_T_C C | ||
V chr21.q.3365*1 26941960 26941960 21_26941960_A_G G | ||
V chr21.q.3365*10 26938989 26938989 21_26938989_G_A A | ||
V chr21.q.3365*10 26940815 26940815 21_26940815_T_C T | ||
V chr21.q.3365*10 26941960 26941960 21_26941960_A_G A | ||
V chr21.q.3365*11 26938353 26938353 21_26938353_T_C T | ||
V chr21.q.3365*11 26938989 26938989 21_26938989_G_A A | ||
|
||
|
||
If the previous example were to be sorted then there would be several errors in the ``.hap`` file. | ||
All sorted files parse the meta information lines first, thus the ``V`` lines would be incomplete. | ||
|
||
As mentioned before, one can use the ``--genotypes`` flag to provide a ``.pgen`` file with which to compare the existence of variant IDs. | ||
aryarm marked this conversation as resolved.
Show resolved
Hide resolved
|
||
The following will check if all of the variant IDs in the ``.hap`` appear in the ``.pvar`` associated to the ``.pgen``. | ||
|
||
.. code-block:: bash | ||
|
||
haptools validate --genotypes tests/data/hapfiles/valhap_test_data.pgen tests/data/hapfiles/valhap_test_data.hap | ||
|
||
.. warning:: | ||
|
||
You must generate a ``.pvar`` from your ``.pgen`` file. | ||
This is done in order to avoid reading heavy amounts of | ||
information which is not relevant to the validation process. | ||
aryarm marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Detailed Usage | ||
~~~~~~~~~~~~~~ | ||
|
||
.. click:: haptools.__main__:main | ||
:prog: haptools | ||
:show-nested: | ||
:commands: validate |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.