Skip to content

Conversation

@joshua-slaughter
Copy link
Member

Hi all, I have put together a configuration for estimating pairwise interactions at the genome-wide scale. The changes to make this possible only required minimal changes to inputs_from_config.jl. A review would be very appreciated!

@olivierlabayle
Copy link
Member

Thank you Josh, I'll have a look asap!

Copy link
Member

@olivierlabayle olivierlabayle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Josh, see my comments in the review, happy to discuss.

@joshua-slaughter joshua-slaughter self-assigned this Mar 18, 2025
@olivierlabayle
Copy link
Member

@joshua-slaughter Could you provide a high level overview of the new functionality here and how a user is meant to interact with it? This description could also be used in the next release of the TarGene docs.

@joshua-slaughter
Copy link
Member Author

@olivierlabayle had to dust this one off haha

Notable changes

  • In the GWAS configuration the keyword extra_treatments can be specified to add treatment variables to be used for higher-order interactions. I have recently updated this to go beyond environmental variables but to also include genetic variants as well enabling genome wide epistasis analyses. However, this could probably be cleaner (e.g. right now it is more similar to the flat config rather than groups)
  • Added output for the pipeline to track genotype counts as this has been of particular interest to the group (questions about case control analyses)
  • Tests have been added to evaluate that the new functionalities behave as expected.
  • Upped the CI.yml

@joshua-slaughter
Copy link
Member Author

@olivierlabayle had to dust this one off haha

Notable changes

* In the GWAS configuration the keyword `extra_treatments` can be specified to add treatment variables to be used for higher-order interactions. I have recently updated this to go beyond environmental variables but to also include genetic variants as well enabling genome wide epistasis analyses. However, this could probably be cleaner (e.g. right now it is more similar to the flat config rather than groups)

* Added output for the pipeline to track genotype counts as this has been of particular interest to the group (questions about case control analyses)

* Tests have been added to evaluate that the new functionalities behave as expected.

* Upped the CI.yml

Will work on docs for this soon but would like to get your opinion on the current implementation if possible (at least for the interaction setup)

Copy link
Member

@olivierlabayle olivierlabayle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Josh, thanks for the PR, please find my comments inline. Main things are:

  1. I think the summary stats (counts) should either be moved to TMLECLI as an option or kept here but in a different function. It would also be good to discuss exactly what we want to countI suggest to make use of DataFrames.jl groupby combine in any case to simplify the code and make it faster.

  2. The GWIS looks mostly good while I think it is getting quite complicated and requires a bit more testing of edge cases that are not covered.

Thanks!

@joshua-slaughter
Copy link
Member Author

@olivierlabayle addressed all comments. Will open summary stats PR in TMLECLI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

The functionality of make_outputs is no up to date and fails with later releases of TMLE.jl and TMLECLI.jl

3 participants