Skip to content

add preprocessing tutorial with multiple examples #398

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

lordy5
Copy link
Collaborator

@lordy5 lordy5 commented Feb 11, 2025

Formatting

  • My tutorial has only one top-level (#) header

Reproducibility

  • My tutorial works on Google Colab
  • My tutorial sets scvi.settings.seed = 0 at the beginning of the notebook
  • My tutorial has been run and includes outputs (e.g. plots, tables)

Other

  • Counts and normalized data should co-exist in the datasets, see the API overview for an example
  • For scRNA-seq data, normalization should be counts per median library size and then log1p transformed -- if not, a reason should be given

@lordy5 lordy5 added the documentation Improvements or additions to documentation label Feb 11, 2025
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@lordy5 lordy5 changed the title add preprocessing tutorial starting from cell ranger output add preprocessing tutorial with multiple examples Feb 28, 2025
@lordy5 lordy5 marked this pull request as ready for review February 28, 2025 04:50
@lordy5
Copy link
Collaborator Author

lordy5 commented Mar 1, 2025

I still need to link to the preprocessing tutorial from the other relevant tutorials and remove the preprocessing sections from those, but first want to see if anything needs to be changed/added to the preprocessing tutorial.

@ori-kron-wis
Copy link
Contributor

Please run it with the most recent scvi-tools version (which is now 1.3v) .
Also use more TODO's on code so I can find the questions more easily.

For the concatenation of 2 datasets, I think you meant the old anndata preprocessing part, where the function pbmcs_10x_cite_seq downloads 2 adata , do preprocessing to them and concatenates them?.
I think the preprocess tutorial should replace that part now, no? so we will have just one place to download the already ready mudata from? In such case we will download only 1 file that is already preprocessed, it can have the same batch column like before, so we expect the same results but it can also have other columns as batch key

@lordy5
Copy link
Collaborator Author

lordy5 commented Mar 12, 2025

@ori-kron-wis Which tutorials should I remove the preprocessing sections from, now that there is the preprocessing notebook? I was thinking of removing it from the tutorials whose exact datasets I use in the preprocessing notebook, and keeping it for the others, but then still linking the preprocessing tutorial in all of them, so users know how to use their own datasets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants