Skip to content

SelectIntegrationFeatures - ordering of features deemed variable in more than tie.val many datasets #8289

@eroell

Description

@eroell

Hi! Thanks for this nice package and your activity here!

I have a question regarding SelectIntegrationFeatures:

The reference reads

Choose the features to use when integrating multiple datasets. This function ranks features by the number of datasets they are deemed variable in, breaking ties by the median variable feature rank across datasets. It returns the top scoring features by this ranking.

matching the description in pages e3 and e4 in your paper.

When I wanted to check the code for SelectIntegrationFeatures, I got the impression that the following is done:

  1. Compute variable features per dataset here
  2. Sort genes by number-of-datasets-variable here
  3. Choose the threshold number (tie.val) of number-of-datasets-variable here
  4. Select all "safe" genes (features) which have number-of-datasets-variable > tie.val here
  5. Order all of these "save" genes by median rank here
  6. Compute median rank for genes that have number-of-datasets-variable == tie.val here
  7. Use the top median rank features from 6. to fill up the "save" genes up to nfeatures here

This does indeed, as the documentation says, return the top scoring features by this ranking.
However, if I laid this out correctly, the ordering of the "save" genes is not by number-of-datasets-variable first, and median ranks to break ties; but only by median ranks.
I am not sure if users would care about this ordering as long as the nfeatures many top genes are selected - to me it would come unexpectedly.

Since I am not particularly competent in R, I would like to ask:
Is this observation correct?

many thanks!

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions