The interface would be at least like the following: `AnnData -> dict[AnnData]'.
Let's limit our variables to consider to at most 2 (e.g., cell_type:a/b and split: train/val/test)
Cases:
- User wants to split for each set
train, test, val s.t. all these preserve their cell type distribution. E.g. if there is quarter celltype a in the whole dataset in train, test and val they should also be 1/4 of their respective sets. (here no entry is duplicate but in sampling this might not be the case)
- User wants to sample each set so that the classes have equal proportion. (here there can be duplicate entries)
@FrancescaDr here do you have any more cases you'd like to discuss? Also which one would be more important for you?
The interface would be at least like the following: `AnnData -> dict[AnnData]'.
Let's limit our variables to consider to at most 2 (e.g., cell_type:a/b and split: train/val/test)
Cases:
train, test, vals.t. all these preserve their cell type distribution. E.g. if there is quarter celltype a in the whole dataset in train, test and val they should also be 1/4 of their respective sets. (here no entry is duplicate but in sampling this might not be the case)@FrancescaDr here do you have any more cases you'd like to discuss? Also which one would be more important for you?