Splitting and Sampling Strategies

The interface would be at least like the following: `AnnData -> dict[AnnData]'.

Let's limit our variables to consider to at most 2 (e.g., cell_type:a/b and split: train/val/test)

Cases:
- User wants to split for each set `train, test, val` s.t. all these preserve their cell type distribution. E.g. if there is quarter celltype a in the whole dataset in train, test and val they should also be 1/4 of their respective sets. (here no entry is duplicate but in sampling this might not be the case)
- User wants to sample each set so that the classes have equal proportion. (here there can be duplicate entries)

@FrancescaDr here do you have any more cases you'd like to discuss? Also which one would be more important for you?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Splitting and Sampling Strategies #59

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Splitting and Sampling Strategies #59

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions