-
Notifications
You must be signed in to change notification settings - Fork 3
use SummarizedExperiment instead of loose data structures #95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
P.S. maybe it is also a good idea, to have a separate branch |
Hi Thomas, Thanks for looking into this and making all these comments and changes! In regard to the SummarizedExperiment file:
|
sounds good. Out of curiosity, why would you want to maintain both? Yes, I pointed many out, but also made some changes :D So, whenever there is a "EDIT:" I didn't do any changes, but for other instances I did. Sorry, for being inconsistent here. |
P.S. (again and in addition to the points above): Also, looking through your functions, it seems to me that many functions are not specific to metabolite analysis, but could also be used for other data modalities. I was wondering if you would like to reflect this in your naming and documentation that the package is more universal than for metabolite analysis :) |
Yes, I completely understand your point. Yet many of the facilities I have talked with, whilst designing the package, wanted DFs as input. Moreover, for the biological clustering functions we need differential analysis results, but can still pass other information. To my knowledge this will not work with se. For differential analysis results and some of the mapping ambiguities I am also not sure how we would link this with the input matrix, sample and feature metadata information. But maybe we can discuss about those specific cases. For your P.S. - yes I have designed them like this on purpose. They work with any data modality and I will extend the vignettes to other omics and multi-omics in the future. On the dev branch we already have proteomics and transcriptomics and we will write some vignettes for those too. I have not considered to change the package name tough, but to extend the parameter descriptions. |
Hi @tnaake, Many thanks for your PR! It takes big steps in directions that we anyway wanted to pursue, namely, the code quality improvements, the BioC compliance, and the use of suitable objects for a more streamlined implementation and API, are all essential for the consolidation of this package. We'll review and merge the PR in the next days. One little note though: please base contributions on the |
Should I then create a new PR against the |
We will come back to you about this beginning of next week as we are currently in the process of merging some bigger changes into main and we also need some time to review the current work you have done (and maybe we can already use part of it as is). In addition, as mentioned, we planned to keep the standard input too by packing all the inputs into an se file as the first step of each function and then proceed (to alow both se input or dfs). This would also involve offering the example data in both formats and having both options under the @example documentation amongst other things we have noted for adding se. So it may make sense to take these needs into account when making further changes. |
Hi Thomas, As promised I am coming back to you about the se PR. We think it would be great if we could properly plan the addition of se input as you have already done a big chunk of this. For this we have a few requirements:
Since, we are currently also conduction changes for code readibility, to remove unnessesary dependencies and prepare for Bioconductor checks, we are continuosly working on the development branch. This will also include changes in parameter names. Hence, it would be important to decide on initial goals and proceed step-wise, to ensure no conflicts are generated. As we are meeting next week, we can probably discuss this as it will be easier than in writing. @deeenes will take care of communicating the details on contrib guidelines for continuous delivery. |
Hello @ChristinaSchmidt1
This PR adds to (some extent, see below)
SummarizedExperiment
functionality to your package (issue #70). The implementation is still missing partly forVizVolcano
functions - I didn't fully understand what is going on here and it would be good to discuss this more in detail. Also, most of the examples and the vignettes need to be adjusted for the new data structure. I have also reviewed most of the code and added several comments (via "EDIT:") to the code base where I think the code could be improved. Furthermore, I have adjusted the code style to some extent to BioC style.Here are some further thoughts on the package (also in light of above):
match.arg
instead of the tests written by you (it is much simpler and a standard). Also, write helper functions as needed for repetitive code snippets. Test the functions viatestthat
or alike. In general, add test functions for all functions you usedcharacter(n)
,numeric(n)
,logical(n)
. It seems difficult to understand what the structure should be for some of the argumentsLog2(Distance)
just uselog2_distance
%>%
use just|>
to remove dependencies. Also what is the added benefit of using operators such as%<>%>
. I would rather focus on easy-to-read code than complex code and would not assume that every user that looks at the package knows what these operators meanInputData
is quite generic and does not really imply what is the inputI am happy to continue with the integration of
SummarizedExperiment
but thought it would be good to quickly check with you if this goes in the right direction and get green lights before investing too much time :)