Polish of the usage and TODOs to clarify and simplify options#219
Conversation
gregorgorjanc
left a comment
There was a problem hiding this comment.
@XingerTang @AprilYUZhang @RosCraddock critical review of these changes would be much appreciated.
This should now be a model on how we should polish AlphaImpute2 documentation too. Maybe we can leverage some text from here (say the zero_one_two_etc note, maybe also others)
There was a problem hiding this comment.
One more, I tested most options by creating input files based on the input file format section and running these commands
AlphaPeel -ped_file ped.txt -geno_file geno.txt -out_file test - method multi-locus
AlphaPeel -ped_file ped.txt -geno_file geno.txt -pheno_penetrance_file pheno_penetrance.txt -phased_geno_prob -geno_prob -geno -hap -seg_prob -out_file test -update_alt_allele_prob -est_rec_prob -est_geno_error_prob
AlphaPeel -ped_file ped.txt -geno_file geno.txt -seq_file seq.txt -pheno_penetrance_file pheno_penetrance.txt -phased_geno_prob -geno_prob -geno -hap -seg_prob -out_file test -est_alt_allele_prob -update_alt_allele_prob -alt_allele_prob -est_geno_error_prob -est_seq_error_prob
AlphaPeel -ped_file ped.txt -geno_file geno_single.txt -pheno_file pheno.txt -pheno_penetrance_file pheno_penetrance.txt -phased_geno_prob -geno_prob -geno -hap -seg_prob -pheno_prob -out_file test -est_pheno_error_prob
Update: here are all the input files used above:
| simple_output.dosage.txt | ||
| simple_output.rec_prob.txt | ||
| TODO: Let's round up these thresholds to 0.3333333333333333 to 0.33, so 2 digits | ||
| This should be enough, I reckon, but happy to discuss! |
There was a problem hiding this comment.
Yes, that would be easier to read.
There was a problem hiding this comment.
@XingerTang can you please take this one on? Decide if you want to open an issue or just work on it as part of your workflow. Once you do, please open PR to this PR and add edits to it regarding this rounding up.
|
@XingerTang I have met with @RosCraddock and we addressed some of the TODOs in the file, she has already done some of these changes in tinyhouse, but we should ensure they are also available in AlphaPeel and she will open PR to this PR to address the doc changes. @XingerTang please go over the TODO's left in the changed files in this PR and reflect on the proposed changes. I think many of these are straightforward so let's do them ASAP;) Decide if you want to open an issue for each of those or just work on it as part of your workflow. Once you do, please open PR to this PR and add edits to it regarding the documentation updates. Happy to meet about some of the more gnarly TODOs. Once we are done on all the ends we will have the code and docs sorted and polished and functionality clearer, phiu. |
XingerTang
left a comment
There was a problem hiding this comment.
@gregorgorjanc, I may need a meeting with @RosCraddock and @AprilYUZhang to understand what to change and how to implement the change.
There was a problem hiding this comment.
The original help messages match the output of AlphaPeel -h exactly. Do we need to update the AlphaPeel help messages as well? Some of the changes, such as the subsections of "Individuals" and "Markers", may not be replicated in the help message output.
There was a problem hiding this comment.
Yes, update the code doc strings - I only focused on the rst file! Good call!
| -rec_length REC_LENGTH | ||
| Recombination length of the chromosome in Morgans. | ||
| Default: 1.00. | ||
| TODO: rename mutation_rate to mut_prob |
| Default: 1. | ||
|
|
||
| Estimation of model parameters: | ||
| TODO: rename to est_start_alt_allele_prob |
| Alternative allele probability file | ||
| =================================== | ||
| TODO: how does seg_file look like? | ||
| TODO: how does seg_map_file look like? |
|
|
||
| The phenotype penetrance file allows for user flexibility to include phenotype information. This file contains the conditional probability table for true phenotype given the true genotype. Each column represents the different phenotype states (order from 0 to total number of states), and each row represents the genotype states in order the aa, aA, Aa, AA. Below, we provide an example for a monogenic recessive, binary trait. | ||
| The ``.geno_prob.txt`` file contains *genotype probabilities* for each individual. | ||
| TODO: remove the empty lines in the output files? (4 will be 3 in text!) |
There was a problem hiding this comment.
The empty lines are still an open issue (#117).
There was a problem hiding this comment.
Noted. This should be a simple fix;)
|
|
||
| :: | ||
| 1. The grand *paternal* allele from the father and | ||
| the grand *paternal* allele from the mother (``Pr(F_p,- & M_p,-)``), |
There was a problem hiding this comment.
Maybe we can use LaTeX here.
There was a problem hiding this comment.
I thought the same, but didn't know how;) Does the notation make sense to you?
There was a problem hiding this comment.
I suggest we do the following way:
| the grand *paternal* allele from the mother (``Pr(F_p,- & M_p,-)``), | |
| the grand *paternal* allele from the mother (:math:`Pr(F_{p,-} & M_{p,-})`), |
|
|
||
| Dosage file | ||
| =========== | ||
| TODO: Do we need a diagram for this maybe even show the 4 cases so we bring home the message!? |
There was a problem hiding this comment.
Yes, it will be useful.
142fd5b to
364a3bc
Compare
7df35cb to
36af2da
Compare
I was always annoyed by the usage section since it was confusing us and was not consistent. I have now completely overhauled it and strived to describe one thing in one place and cross-link where needed.
I also went and sorted all options according to their input, output, peeling methods, and peeling parameters.
Then I went through all input file formats (in the order listed in input section).
Then I went through all output file formats (in the order listed in output section).
Then I went through all peeling parameters file formats (in the order listed in peeling parameters section).
This was helpful because it points out things that are inputted, outputted, how peeling methods are run, and how we can tweak peelling parameters.
It also shows areas that we should polish (some option renames) or check.
There is a number of simple TODOs for us to sort out! I will today work with @RosCraddock on the allele prob estimation parts and pheno penetrance part.