-
Notifications
You must be signed in to change notification settings - Fork 978
Description
Hi! Thanks for this nice package and your activity here!
I have a question regarding SelectIntegrationFeatures:
The reference reads
Choose the features to use when integrating multiple datasets. This function ranks features by the number of datasets they are deemed variable in, breaking ties by the median variable feature rank across datasets. It returns the top scoring features by this ranking.
matching the description in pages e3 and e4 in your paper.
When I wanted to check the code for SelectIntegrationFeatures, I got the impression that the following is done:
- Compute variable features per dataset here
- Sort genes by number-of-datasets-variable here
- Choose the threshold number (
tie.val) of number-of-datasets-variable here - Select all "safe" genes (
features) which have number-of-datasets-variable >tie.valhere - Order all of these "save" genes by median rank here
- Compute median rank for genes that have number-of-datasets-variable ==
tie.valhere - Use the top median rank features from 6. to fill up the "save" genes up to
nfeatureshere
This does indeed, as the documentation says, return the top scoring features by this ranking.
However, if I laid this out correctly, the ordering of the "save" genes is not by number-of-datasets-variable first, and median ranks to break ties; but only by median ranks.
I am not sure if users would care about this ordering as long as the nfeatures many top genes are selected - to me it would come unexpectedly.
Since I am not particularly competent in R, I would like to ask:
Is this observation correct?
many thanks!