-
Couldn't load subscription status.
- Fork 14
Fixes Issue #94 - #108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fixes Issue #94 - #108
Conversation
|
@jananiravi I have made the changes. Kindly review/guide me for further steps if required in this Issue. |
- Refactored and merged R/tree.R CreateFA2Tree and ConvertFA2Tree. - Parametrized every hardcoded value and path in ConvertFA2Tree - Refactored convertAlignment2Trees to support running both modes. - Backward compatible - Updated all documentation for R/tree.R
|
FIxed the commit message to reflect the correct issue number |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple thoughts and suggestions as I passed through. I'm not 100% certain on how the tree generation is meant to flow from the older script you're refactoring, so please ask me for clarification or feel free to just toss suggestions if they don't align with the actual logic going on here!
R/tree.R
Outdated
| # estimate a distance matrix using a Jules-Cantor Model | ||
| dna_dist <- dist.ml(prot10, model = "JC69") | ||
| # estimate a distance matrix using a Jules-Cantor Model | ||
| dna_dist <- dist.ml(prot_subset, model = dna_model) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| dna_dist <- dist.ml(prot_subset, model = dna_model) | |
| prot_dist <- dist.ml(prot_subset, model = mt) |
I know the original outdated code references DNA here, but I believe this should all be protein based on the inputs. If the change to prot_dist is correct, the code below will also need modification to reflect the variable name changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jananiravi : Can you confirm if the above can be considered correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@epbrenner are these the unused tree generators? Using dna could be a stub? Maybe change to protein for now (will work with current molevolvr) or make it a function argument to take dna or protein fasta sequences -- future-proofing the function given starting with DNA seqs will be part of molevolvr sooner rather than later. Does that answer your question, @Klangina ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure @jananiravi, I will create an argument to switch between DNA and protein sequences and keep the default as protein. Will update the pr shortly.
R/tree.R
Outdated
| plot(prot_NJ, main = "Neighbor Joining") | ||
| ## Neighbor Joining, UPGMA, and Maximum Parsimony | ||
| # quick and dirty UPGMA and NJ trees | ||
| prot_UPGMA <- upgma(dna_dist) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| prot_UPGMA <- upgma(dna_dist) | |
| prot_UPGMA <- upgma(prot_dist) |
R/tree.R
Outdated
| ## Neighbor Joining, UPGMA, and Maximum Parsimony | ||
| # quick and dirty UPGMA and NJ trees | ||
| prot_UPGMA <- upgma(dna_dist) | ||
| prot_NJ <- NJ(dna_dist) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| prot_NJ <- NJ(dna_dist) | |
| prot_NJ <- NJ(prot_dist) |
R/tree.R
Outdated
| # ml estimation w/ distance matrix | ||
| fit <- pml(prot_NJ, prot_subset) | ||
| print(fit) | ||
| fitJC <- optim.pml(fit, model = ml_model, rearrangement = rearrangement) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| fitJC <- optim.pml(fit, model = ml_model, rearrangement = rearrangement) | |
| fit_optimized <- optim.pml(fit, model = fit, rearrangement = rearrangement) |
Or something similar. JC here stands for Jukes-Cantor, which is a DNA substitution model, so removing that from the variable. I believe the model in model = ml_model will actually be the prior tree generated by pml(), which would be fit here, and then optim.pml() optimizes the previous model. Do double-check me on this, though!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, switching model based on DNA/protein input could be part of the function, too.
R/tree.R
Outdated
| fit <- pml(prot_NJ, prot_subset) | ||
| print(fit) | ||
| fitJC <- optim.pml(fit, model = ml_model, rearrangement = rearrangement) | ||
| logLik(fitJC) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| logLik(fitJC) | |
| logLik(fit_optimized) |
If above suggestion is correct.
R/tree.R
Outdated
| bs = bootstrap_reps, optNni = TRUE, multicore = TRUE, | ||
| control = pml.control(trace = 0) | ||
| ) | ||
| plotBS(midpoint(fitJC$tree), bs, p = bootstrap_p, type = plot_type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| plotBS(midpoint(fitJC$tree), bs, p = bootstrap_p, type = plot_type) | |
| plotBS(midpoint(fit_optimized$tree), bs, p = bootstrap_p, type = plot_type) |
R/tree.R
Outdated
| prot_subset_NJ <- NJ(prot_subset_dm) | ||
| fit2 <- pml(prot_subset_NJ, data = prot_subset) | ||
| print(fit2) | ||
| fitJC2 <- optim.pml(fit2, model = ml_model, rearrangement = rearrangement) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| fitJC2 <- optim.pml(fit2, model = ml_model, rearrangement = rearrangement) | |
| fit_optimized2 <- optim.pml(fit2, model = fit2, rearrangement = rearrangement) |
As before, if model should be the output of pml() then update. Discard if I'm mistaken though!
R/tree.R
Outdated
| fit2 <- pml(prot_subset_NJ, data = prot_subset) | ||
| print(fit2) | ||
| fitJC2 <- optim.pml(fit2, model = ml_model, rearrangement = rearrangement) | ||
| logLik(fitJC2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| logLik(fitJC2) | |
| logLik(fit_optimized2) |
R/tree.R
Outdated
| print(fit2) | ||
| fitJC2 <- optim.pml(fit2, model = ml_model, rearrangement = rearrangement) | ||
| logLik(fitJC2) | ||
| bs_subset <- bootstrap.pml(fitJC2, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| bs_subset <- bootstrap.pml(fitJC2, | |
| bs_subset <- bootstrap.pml(fit_optimized2, |
R/tree.R
Outdated
| bs = bootstrap_reps, optNni = TRUE, multicore = TRUE, | ||
| control = pml.control(trace = 0) | ||
| ) | ||
| bs2 <- plotBS(midpoint(fitJC2$tree), bs, p = bootstrap_p, type = plot_type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| bs2 <- plotBS(midpoint(fitJC2$tree), bs, p = bootstrap_p, type = plot_type) | |
| bs2 <- plotBS(midpoint(fit_optimized2$tree), bs, p = bootstrap_p, type = plot_type) |
- renamed dna_dist to seq_dist - renamed fitJC and fitJC2 to fit_optimised and fit_optimised2 to make variable names more general. - Updated code and documentation accordingly.
|
@epbrenner @jananiravi : Made changes as per recommendations before. Kindly review. |
… following two issues: createFA2Tree: no visible global function definition for ‘midpoint’ createFA2Tree: no visible binding for global variable ‘fit_optimized2’
Description
What kind of change(s) are included?
Checklist
Please ensure that all boxes are checked before indicating that this pull request is ready for review.