Add automated cell type annotation with CellTypist to Clustering 3K PBMCs with Scanpy tutorial and Vitessce visalization#6786
Conversation
| > | ||
| > 2. Rename the `vitessce.json` output to `Vitessce config - clusters` | ||
| > | ||
| > 3. Click on the {% icon galaxy-eye %} (**View data**) icon of the `Vitessce config - clusters` dataset to explore the clusters interactively in Vitessce |
There was a problem hiding this comment.
would it be possible to add a small gif showing how to interactively explore?
There are some examples here: https://vitessce.github.io/easy_vitessce/ and here https://vitessce.io/examples/
If there is some missing functionality, we wil add it.
There was a problem hiding this comment.
I am uncertain if GTN Supports gifs. Is it supported? Thank you!!@shiltemann
| > {: .solution} | ||
| {: .question} | ||
|
|
||
| > <hands-on-title>Explore clusters interactively with Vitessce</hands-on-title> |
There was a problem hiding this comment.
If there is not much to explore interactively in this plot, i would move the first interactive plotting to "Visualization of expression of the marker genes" step.
| > 1. {% tool [CellTypist](toolshed.g2.bx.psu.edu/repos/iuc/celltypist/celltypist/1.7.1+galaxy0) %} with the following parameters: | ||
| > - {% icon param-file %} *"Input AnnData file"*: `3k PBMC with only HVG, after scaling, PCA, KNN graph, UMAP, clustering, marker genes with Wilcoxon test, annotation` | ||
| > - *"Select model from"*: `History` | ||
| > - *"Select a models or train a model from history"*: `Train a model on an existing AnnData and use it` |
There was a problem hiding this comment.
why do you need training here? can you use a reference model to annotate the cells?
There was a problem hiding this comment.
Both options are possible, maybe the user is exploring new cell-types. The datasete from the tutorial has very well known immune cell-types but it could be useful for other users who are studying tissue with special characteristics. Maybe it do not need to be a whole hands-on section but it would be useful to mention it. What do you think?
There was a problem hiding this comment.
Then please make the training aspect into a comment or a question. It is confusing to see both options.
| > 1. {% tool [CellTypist](toolshed.g2.bx.psu.edu/repos/iuc/celltypist/celltypist/1.7.1+galaxy0) %} with the following parameters: | ||
| > - {% icon param-file %} *"Input AnnData file"*: `3k PBMC with only HVG, after scaling, PCA, KNN graph, UMAP, clustering, marker genes with Wilcoxon test` | ||
| > - *"Select model from"*: `Cached` | ||
| > - *"Choose CellTypist model"*: `immune sub-populations combined from 20 tissues of 18 studies (v2)` |
There was a problem hiding this comment.
Can you please check again which model is the best fit here and guide the users a bit?
There was a problem hiding this comment.
This would the one that will yield best detailed results IMHO, but I can also run the same model without sub-populations, (only populations). The other ones do not apply - one is covid related, the other one is from human fetus. This sample dataset is from immune cells so the most appropriate would be an immune cell model.
There was a problem hiding this comment.
This other model offers a less detailed result! @pavanvidem However, the subtypes of Monocytes are not distinguished, for example.
There was a problem hiding this comment.
It looks better. Which one is it? BTW, the covid models also have data from healthy samples.
There was a problem hiding this comment.
It looks better but is not necessarily the best fit. The immune subtype tissue model is the best fit because it's the only one that correctly separates Classical from Non-classical monocytes, corresponding to the CD14+ and FCGR3A+ clusters in the tutorial. Also, it does not assigns ILCs with relatively high confidence, which is not expected as a prominent population in PBMCs. I wouldn't recommend the COVID model in this case even though it included healthy controls. I prefer to keep the current image :)
This PR adds automated cell type annotation as an alternative approach to the manual
annotation section of the "Clustering 3K PBMCs with Scanpy" tutorial, using a
Choose Your Own Analysis (CYOA) format so users can choose between:
Changes
# Cell type annotationwith two paths: Manual and CellTypistlouvaincolumn as training labels)Known issues
FileNotFoundError. This needs tobe fixed by the server admins before this path can be fully tested.
TODO in this PR
Related tool
https://usegalaxy.eu/?tool_id=toolshed.g2.bx.psu.edu/repos/iuc/celltypist/celltypist/1.7.1+galaxy0