Skip to content
Open
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
121 changes: 121 additions & 0 deletions topics/single-cell/tutorials/scrna-scanpy-pbmc3k/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -1357,6 +1357,34 @@ The cells in the same clusters should be co-localized in the UMAP coordinate plo
> {: .solution}
{: .question}

> <hands-on-title>Explore clusters interactively with Vitessce</hands-on-title>
Comment thread
dianichj marked this conversation as resolved.
Outdated
>
> 1. {% tool [Scanpy plot](toolshed.g2.bx.psu.edu/repos/iuc/scanpy_plot/scanpy_plot/1.11.5+galaxy0) %} with the following parameters:
> - {% icon param-file %} *"Annotated data matrix"*: `3k PBMC with only HVG, after scaling, PCA, KNN graph, UMAP, clustering`
> - *"Method used for plotting"*: `Embeddings: Scatter plot in UMAP basis, using 'pl.umap'`
> - *"Keys for annotations of observations/cells or variables/genes"*: `louvain`
> - *"Make an interactive plot?"*: `Yes`
Comment thread
dianichj marked this conversation as resolved.
Outdated
>
> 2. Rename the `vitessce.json` output to `Vitessce config - clusters`
>
> 3. Click on the {% icon galaxy-eye %} (**View data**) icon of the `Vitessce config - clusters` dataset to explore the clusters interactively in Vitessce
Comment thread
dianichj marked this conversation as resolved.
Outdated
>
> ![Vitessce interactive visualization of Louvain clusters](../../images/scrna-scanpy-pbmc3k/vitessce_clusters.png "Vitessce showing the UMAP with the 8 Louvain clusters and Cell Sets panel.")
>
> > <question-title></question-title>
> >
> > Explore the UMAP in Vitessce. Can you identify distinct groups of cells? How does this compare to the static plot above?
> >
> > > <solution-title></solution-title>
> > >
> > > Vitessce allows you to interactively explore the clusters by hovering over cells, selecting groups, and linking views. The 8 Louvain clusters should be clearly visible. This interactive view will be especially useful after cell type annotation, when we can compare the cluster labels with the predicted cell types.
> > >
> > {: .solution}
> >
> {: .question}
>
{: .hands_on}

# Finding marker genes

To give sense to the clusters, we need to identify the genes that drive separation between clusters. These marker genes can then be used to assign biological sense (e.g. cell type) to each cluster based on their functional annotation, but also to identify subtle differences between clusters (e.g., changes in activation or differentiation state) based on the behaviour of genes in the affected pathways.
Expand Down Expand Up @@ -1705,6 +1733,11 @@ In the next steps, we are mostly interested in the marker genes for each cluster

Obtaining clusters of cells is quite straightforward. Determining what biological state is represented by each of those clusters is likely the most challenging task in scRNA-Seq data analysis. To do so, we need to bridge the gap between our current dataset and prior biological knowledge.

{% include _includes/cyoa-choices.html option1="Manual" option2="CellTypist" default="Manual"
text="There are two approaches for cell type annotation. Choose the one that suits you best!" %}

<div class="Manual" markdown="1">

This biological knowledge is not always available in a consistent and quantitative manner. For example, the concept of "cell type" is not clearly defined. The interpretation of scRNA-seq data is often then quite manual.

Fortunately in the case of our dataset, we can use canonical markers to known cell types:
Expand Down Expand Up @@ -1902,6 +1935,94 @@ With the annotated cell types, we can also visualize the expression of their can
> {: .solution}
{: .question}

</div>

<div class="CellTypist" markdown="1">

The automated approach uses CellTypist, a tool that applies pre-trained logistic classifiers to predict cell identities directly from the normalized expression data, without requiring prior knowledge of canonical marker genes.

> <comment-title></comment-title>
>
> CellTypist requires a log1p-normalized expression matrix (normalized to 10,000 counts per cell), which is already stored in the `raw` attribute of our AnnData object from the preprocessing steps above.
>
{: .comment}

> <hands-on-title>Automated cell type annotation with CellTypist (Train from AnnData)</hands-on-title>
>
> 1. {% tool [CellTypist](toolshed.g2.bx.psu.edu/repos/iuc/celltypist/celltypist/1.7.1+galaxy0) %} with the following parameters:
> - {% icon param-file %} *"Input AnnData file"*: `3k PBMC with only HVG, after scaling, PCA, KNN graph, UMAP, clustering, marker genes with Wilcoxon test, annotation`
> - *"Select model from"*: `History`
> - *"Select a models or train a model from history"*: `Train a model on an existing AnnData and use it`
Comment thread
dianichj marked this conversation as resolved.
Outdated
> - {% icon param-file %} *"Select an AnnData file from history"*: `3k PBMC with only HVG, after scaling, PCA, KNN graph, UMAP, clustering, marker genes with Wilcoxon test`
> - *"The column name in the .obs attribute of the training AnnData file that contains the cell type labels"*: `louvain`
> - *"Refine the predicted labels by running the majority voting classifier after over-clustering"*: `Yes`
> - *"Annotation mode"*: `Choose the cell type with the largest score/probability as the final prediction`
> - *"Probability threshold"*: `0.5`
> - *"Generate a dotplot of the predicted cell types"*: `Yes`
> - *"Reference column in AnnData.obs for dotplot"*: `louvain`
> - *"Prediction label in AnnData.obs for dotplot"*: `predicted_labels`
> - *"Dotplot format"*: `png`
>
> 2. Rename the generated output `3k PBMC CellTypist annotated`
>
> 3. Inspect the dotplot output
>
> ![CellTypist label transfer dotplot](../../images/scrna-scanpy-pbmc3k/celltypist_dotplot.png "CellTypist label transfer dotplot showing the predicted cell types against the Louvain clusters.")
>
{: .hands_on}

> <hands-on-title>Automated cell type annotation with CellTypist (Cached model)</hands-on-title>
>
> 1. {% tool [CellTypist](toolshed.g2.bx.psu.edu/repos/iuc/celltypist/celltypist/1.7.1+galaxy0) %} with the following parameters:
> - {% icon param-file %} *"Input AnnData file"*: `3k PBMC with only HVG, after scaling, PCA, KNN graph, UMAP, clustering, marker genes with Wilcoxon test`
> - *"Select model from"*: `Cached`
> - *"Choose CellTypist model"*: `immune sub-populations combined from 20 tissues of 18 studies (v2)`
Comment thread
dianichj marked this conversation as resolved.
Outdated
> - *"Refine the predicted labels by running the majority voting classifier after over-clustering"*: `Yes`
> - *"Annotation mode"*: `Choose the cell type with the largest score/probability as the final prediction`
> - *"Probability threshold"*: `0.5`
> - *"Generate a dotplot of the predicted cell types"*: `Yes`
> - *"Reference column in AnnData.obs for dotplot"*: `louvain`
> - *"Prediction label in AnnData.obs for dotplot"*: `predicted_labels`
> - *"Dotplot format"*: `png`
>
> 2. Rename the generated output `3k PBMC CellTypist annotated with model`
>
> 3. Inspect the dotplot output
>
> ![CellTypist cached model dotplot](../../images/scrna-scanpy-pbmc3k/celltypist_dotplot_cached.png "CellTypist label transfer dotplot using the cached immune model, showing predicted cell type labels against the Louvain clusters.")
>
{: .hands_on}

> <hands-on-title>Explore CellTypist annotations interactively with Vitessce</hands-on-title>
>
> 1. {% tool [Scanpy plot](toolshed.g2.bx.psu.edu/repos/iuc/scanpy_plot/scanpy_plot/1.11.5+galaxy0) %} with the following parameters:
> - {% icon param-file %} *"Annotated data matrix"*: `3k PBMC CellTypist annotated`
> - *"Method used for plotting"*: `Embeddings: Scatter plot in UMAP basis, using 'pl.umap'`
> - *"Keys for annotations of observations/cells or variables/genes"*: `louvain`
> - *"Make an interactive plot?"*: `Yes`
Comment thread
dianichj marked this conversation as resolved.
Outdated
>
> 2. Rename the `vitessce.json` output to `Vitessce config - CellTypist`
>
> 3. Click on the {% icon galaxy-eye %} (**View data**) icon of the `Vitessce config - CellTypist` dataset to explore the annotations interactively
>
> ![Vitessce interactive visualization of CellTypist annotations](../../images/scrna-scanpy-pbmc3k/celltypist_vitessce.png "Vitessce showing the UMAP with CellTypist-annotated cell types and Cell Sets panel.")
>
> > <question-title></question-title>
> >
> > Compare this Vitessce view with the one generated before cell type annotation. What has changed?
> >
> > > <solution-title></solution-title>
> > >
> > > The Cell Sets panel now shows the annotated cell type names (B, CD14+, CD4+ T, CD8+ T, Dendritic, FCGR3A+, Megakaryocytes, NK) with their cell counts, instead of the numbered Louvain clusters. This allows you to interactively explore the biological identity of each cell population.
> > >
> > {: .solution}
> >
> {: .question}
>
{: .hands_on}

</div>

# Conclusion
{% icon congratulations %} Well done, you’ve made it to the end! In this tutorial, we investigated clustering and annotation of single-cell data from 10x Genomics using Scanpy. This workflow used here was typical for scRNA-seq data analysis:

Expand Down