use SummarizedExperiment instead of loose data structures #95

tnaake · 2025-03-26T14:42:58Z

This PR adds to (some extent, see below) SummarizedExperiment functionality to your package (issue #70). The implementation is still missing partly for VizVolcano functions - I didn't fully understand what is going on here and it would be good to discuss this more in detail. Also, most of the examples and the vignettes need to be adjusted for the new data structure. I have also reviewed most of the code and added several comments (via "EDIT:") to the code base where I think the code could be improved. Furthermore, I have adjusted the code style to some extent to BioC style.

Here are some further thoughts on the package (also in light of above):

the code should be simplified as much as possible, I find it difficult to follow what is going on for some steps. You could also think of adding more comments to the code
the code could be more robust, e.g. for many arguments with several options you could name them in the function argument and use match.arg instead of the tests written by you (it is much simpler and a standard). Also, write helper functions as needed for repetitive code snippets. Test the functions via testthat or alike. In general, add test functions for all functions you used
functions should not be defined in other functions, move them outside
improve documentation, add information on the type of the argument, e.g. character(n), numeric(n), logical(n). It seems difficult to understand what the structure should be for some of the arguments
do not create and use column names that require quoting, e.g. instead of Log2(Distance) just use log2_distance
currently the naming of objects is not consistent. You could use (ONLY) camel case for object names and (ONLY) snake case names for function names or differently.
Instead of %>% use just |> to remove dependencies. Also what is the added benefit of using operators such as %<>%>. I would rather focus on easy-to-read code than complex code and would not assume that every user that looks at the package knows what these operators mean
for some arguments / fcts the naming could be improved, e.g. InputData is quite generic and does not really imply what is the input
I have harmonized for most functions the function output (list), this will be a list that has entries "data" and "plot", which are again lists. I have used "data" instead of the "DF" naming since many fcts do not return a "data.frame"

I am happy to continue with the integration of SummarizedExperiment but thought it would be good to quickly check with you if this goes in the right direction and get green lights before investing too much time :)

tnaake · 2025-03-26T14:44:33Z

P.S. maybe it is also a good idea, to have a separate branch SummarizedExperiment before merging it into the main branch.

ChristinaSchmidt1 · 2025-03-26T16:11:06Z

Hi Thomas,

Thanks for looking into this and making all these comments and changes!

In regard to the SummarizedExperiment file:

I would want to maintain both options, the classical passing of DFs as well as SummarizedExperiment files and either detect automatically what was passed or create a paramter se= TRUE/FLASE to leave the choice to the user.
At the moment we have many changes on the development branch that we will need to merged into main, so we may need to do this first.
In terms of function structure and names: We have planned to reduce dependencies, restructure code and documentation, and I will add the comments of yours to the list (if I understood correctly, you did not make any changes in this regard, but pointed out your observations?)

tnaake · 2025-03-26T16:15:44Z

Hi @ChristinaSchmidt1,

sounds good. Out of curiosity, why would you want to maintain both? SE objects are quite easy to construct, are well structured, are encouraged by BioC since it's a BioC S4 class, and will simplify many steps in the code (e.g. merging)?

Yes, I pointed many out, but also made some changes :D So, whenever there is a "EDIT:" I didn't do any changes, but for other instances I did. Sorry, for being inconsistent here.

tnaake · 2025-03-26T16:18:50Z

P.S. (again and in addition to the points above): Also, looking through your functions, it seems to me that many functions are not specific to metabolite analysis, but could also be used for other data modalities. I was wondering if you would like to reflect this in your naming and documentation that the package is more universal than for metabolite analysis :)

ChristinaSchmidt1 · 2025-03-26T16:35:56Z

Yes, I completely understand your point. Yet many of the facilities I have talked with, whilst designing the package, wanted DFs as input. Moreover, for the biological clustering functions we need differential analysis results, but can still pass other information. To my knowledge this will not work with se. For differential analysis results and some of the mapping ambiguities I am also not sure how we would link this with the input matrix, sample and feature metadata information. But maybe we can discuss about those specific cases.

For your P.S. - yes I have designed them like this on purpose. They work with any data modality and I will extend the vignettes to other omics and multi-omics in the future. On the dev branch we already have proteomics and transcriptomics and we will write some vignettes for those too. I have not considered to change the package name tough, but to extend the parameter descriptions.

deeenes · 2025-03-26T16:38:17Z

Hi @tnaake, Many thanks for your PR! It takes big steps in directions that we anyway wanted to pursue, namely, the code quality improvements, the BioC compliance, and the use of suitable objects for a more streamlined implementation and API, are all essential for the consolidation of this package. We'll review and merge the PR in the next days. One little note though: please base contributions on the development branch, merge the upstream daily, and open the PR in a state when it has no conflict with the development branch (yes, we should also write the contrib guidelines).

tnaake · 2025-03-27T08:31:35Z

Should I then create a new PR against the development branch and close this one?

ChristinaSchmidt1 · 2025-03-27T08:49:29Z

We will come back to you about this beginning of next week as we are currently in the process of merging some bigger changes into main and we also need some time to review the current work you have done (and maybe we can already use part of it as is).

In addition, as mentioned, we planned to keep the standard input too by packing all the inputs into an se file as the first step of each function and then proceed (to alow both se input or dfs). This would also involve offering the example data in both formats and having both options under the @example documentation amongst other things we have noted for adding se. So it may make sense to take these needs into account when making further changes.

ChristinaSchmidt1 · 2025-04-03T13:20:41Z

Hi Thomas,

As promised I am coming back to you about the se PR. We think it would be great if we could properly plan the addition of se input as you have already done a big chunk of this. For this we have a few requirements:

We want to keep the current input option as is, but as the first step within the function merge the input into an se and run all downstream analysis on se
Additionally, the user can of course provide an se file as input instead of the dfs and we will automatically detect this
We want to provide the example data as both, dfs or se file by adding an additional paramter. @examples should include both examples
The vignettes will not be changed to se file input at this stage of the project (later we can do this)
There are some functions were we would need to discuss how we can adapt se for those, as they are not classical matrix, feature- and sample metadata

Since, we are currently also conduction changes for code readibility, to remove unnessesary dependencies and prepare for Bioconductor checks, we are continuosly working on the development branch. This will also include changes in parameter names. Hence, it would be important to decide on initial goals and proceed step-wise, to ensure no conflicts are generated. As we are meeting next week, we can probably discuss this as it will be easier than in writing.

@deeenes will take care of communicating the details on contrib guidelines for continuous delivery.

tnaake added 30 commits March 13, 2025 16:16

adjust to Bioc coding style

0e55051

Update NAMESPACE

88dade0

adapt to BiocStyle

359c820

update to BioC style

5e6b200

adjust to Bioc style

f117b16

adjust to Bioc style

c8ec8c5

adjust to Bioc style

21ba99c

adjust to Bioc style

d84c9e3

adjust to Bioc style

dd897e8

adjust to Bioc style

8be7c7e

adjust to Bioc style

813d0be

adjust to Bioc style

19c2313

adjust to Bioc style

75f16ae

adjust to Bioc style

e3658a3

adjust to Bioc style

40664ef

adjust to Bioc style

f8e3bbd

adjust to Bioc style

84db9eb

adjust to Bioc style

30880bf

adjust to Bioc style

5d8ef6e

adjust to Bioc style

69472ce

Update DifferentialMetaboliteAnalysis.R

05c73e9

Update GetPriorKnoweldge.R

93f928c

Update Processing.R

c3e776a

Update RefactorPriorKnoweldge.R

20bbf69

Update Standard Metabolomics.Rmd

0c109ff

implement SE for Processing.R

83ad638

update manual

8e1cdde

Update NAMESPACE

fec217b

update Rd documentation

13001b7

add SummarizedExperiment to fct

f05c020

tnaake added 10 commits March 25, 2025 13:30

add SummarizedExperiment to fct

f7508ad

add SummarizedExperiment to fct

67a53da

add SummarizedExperiment to fct

d56f49b

add SummarizedExperiment to fct

11526d4

add SummarizedExperiment to fct

1b71793

add SummarizedExperiment to fct

da3e7d8

add SummarizedExperiment to fct

17f6ebf

add SummarizedExperiment to fct

404d52e

add SummarizedExperiment to fct

5b63034

adjust to SummarizedExperiment object

ca16981

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

use SummarizedExperiment instead of loose data structures #95

use SummarizedExperiment instead of loose data structures #95

Uh oh!

tnaake commented Mar 26, 2025

Uh oh!

tnaake commented Mar 26, 2025

Uh oh!

ChristinaSchmidt1 commented Mar 26, 2025

Uh oh!

tnaake commented Mar 26, 2025

Uh oh!

tnaake commented Mar 26, 2025

Uh oh!

ChristinaSchmidt1 commented Mar 26, 2025

Uh oh!

deeenes commented Mar 26, 2025

Uh oh!

tnaake commented Mar 27, 2025

Uh oh!

ChristinaSchmidt1 commented Mar 27, 2025 •

edited

Loading

Uh oh!

ChristinaSchmidt1 commented Apr 3, 2025

Uh oh!

Uh oh!

use SummarizedExperiment instead of loose data structures #95

Are you sure you want to change the base?

use SummarizedExperiment instead of loose data structures #95

Uh oh!

Conversation

tnaake commented Mar 26, 2025

Uh oh!

tnaake commented Mar 26, 2025

Uh oh!

ChristinaSchmidt1 commented Mar 26, 2025

Uh oh!

tnaake commented Mar 26, 2025

Uh oh!

tnaake commented Mar 26, 2025

Uh oh!

ChristinaSchmidt1 commented Mar 26, 2025

Uh oh!

deeenes commented Mar 26, 2025

Uh oh!

tnaake commented Mar 27, 2025

Uh oh!

ChristinaSchmidt1 commented Mar 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ChristinaSchmidt1 commented Apr 3, 2025

Uh oh!

Uh oh!

ChristinaSchmidt1 commented Mar 27, 2025 •

edited

Loading