You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/skills.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -54,7 +54,7 @@
54
54
- **Pathway Enrichment** - Pathway and gene-set enrichment analysis on gene lists or ranked gene data, with result interpretation. Supports over-representation analysis (ORA via Enrichr/Fisher's exact/hypergeometric), preranked and standard Gene Set Enrichment Analysis (GSEA), and single-sample scoring (ssGSEA/GSVA) using gseapy and the official g:Profiler client. Covers gene-set libraries (GO Biological Process/Molecular Function/Cellular Component, KEGG, Reactome, WikiPathways, and MSigDB collections including Hallmark, C2 canonical pathways, C5 ontology, and C7 immune signatures), gene-ID mapping (Ensembl/Entrez to symbols via Biomart, g:Convert, or mygene) and organism handling, choice of the statistical background universe, multiple-testing correction (Benjamini-Hochberg FDR vs g:Profiler g:SCS vs Bonferroni), redundancy reduction (enrichment maps, leading-edge genes, term clustering), and publication-ready tables plus dotplots/bar plots/GSEA running-score plots. Includes a CLI helper (run_enrichment.py) that runs ORA or preranked GSEA end-to-end — automatically building the ranking metric from a DESeq2 results table — and writes a results table and dotplot. Cross-references PyDESeq2 and Scanpy upstream (sources of differentially expressed genes and cluster markers) and database-lookup/gget for gene-ID mapping and Reactome/KEGG/STRING APIs. Use cases: functional interpretation of differentially expressed genes, CRISPR-screen hits, and single-cell cluster markers; GO/KEGG/Reactome/WikiPathways enrichment; preranked GSEA from DESeq2 statistics; pathway activity scoring per sample or cell; and building defensible, reproducible enrichment analyses that avoid common pitfalls (gene-ID/organism mismatch, wrong background, thresholding before GSEA)
55
55
-**Scanpy** - Comprehensive Python toolkit for single-cell RNA-seq data analysis built on AnnData (scanpy 1.12.x; Python 3.12+). Provides end-to-end workflows for preprocessing (quality control, scrublet doublet detection, normalization, log transformation), dimensionality reduction (PCA, UMAP, t-SNE), Leiden clustering, marker gene identification, pseudobulk aggregation via `get.aggregate()`, trajectory inference (PAGA, diffusion maps), and visualization. Key features include: efficient handling of large datasets using sparse matrices and experimental Dask out-of-core support, integration with scvi-tools for advanced analysis, batch correction methods (ComBat), and publication-quality plotting. Optional GPU acceleration via rapids-singlecell. Use cases: single-cell RNA-seq analysis, cell-type identification, exploratory cluster markers, pseudobulk DE workflows (with pydeseq2), trajectory analysis, and comprehensive single-cell genomics workflows
56
56
-**scVelo** - RNA velocity analysis for estimating cell state transitions from unspliced/spliced mRNA dynamics. Infers trajectory directions, computes latent time, and identifies driver genes in single-cell RNA-seq data. Complements Scanpy/scVI-tools for trajectory inference, enabling the study of cellular differentiation dynamics and lineage decisions at single-cell resolution
57
-
-**scvi-tools** - Probabilistic deep learning models for single-cell omics analysis. PyTorch-based framework providing variational autoencoders (VAEs) for dimensionality reduction, batch correction, differential expression, and data integration across modalities. Includes 25+ models: scVI/scANVI (RNA-seq integration and cell type annotation), totalVI (CITE-seq protein+RNA), MultiVI (multiome RNA+ATAC integration), PeakVI (ATAC-seq analysis), DestVI/Stereoscope/Tangram (spatial transcriptomics deconvolution), MethylVI (methylation), CytoVI (flow/mass cytometry), VeloVI (RNA velocity), contrastiveVI (perturbation studies), and Solo (doublet detection). Supports seamless integration with Scanpy/AnnData ecosystem, GPU acceleration, reference mapping (scArches), and probabilistic differential expression with uncertainty quantification
57
+
-**scvi-tools** - Probabilistic deep learning models for single-cell omics analysis. PyTorch-based framework providing variational autoencoders (VAEs) for dimensionality reduction, batch correction, differential expression, and data integration across modalities. Includes 30+ models: scVI/scANVI (RNA-seq integration and cell type annotation), totalVI/totalANVI (CITE-seq protein+RNA), MultiVI (multiome RNA+ATAC integration), PeakVI (ATAC-seq analysis), DestVI/Stereoscope/Tangram (spatial transcriptomics deconvolution), MethylVI (methylation), CytoVI (flow/mass cytometry), VeloVI (RNA velocity), contrastiveVI (perturbation studies), and Solo (doublet detection). Supports seamless integration with Scanpy/AnnData ecosystem, GPU acceleration, reference mapping (scArches), and probabilistic differential expression with uncertainty quantification
58
58
-**scikit-bio** - Python library for bioinformatics providing data structures, algorithms, and parsers for biological sequence analysis. Built on NumPy, SciPy, and pandas. Key features include: sequence objects (DNA, RNA, protein sequences) with biological alphabet validation, sequence alignment algorithms (local, global, semiglobal), phylogenetic tree manipulation, diversity metrics (alpha diversity, beta diversity, phylogenetic diversity), distance metrics for sequences and communities, file format parsers (FASTA, FASTQ, QIIME formats, Newick), and statistical analysis tools. Provides scikit-learn compatible transformers for machine learning workflows. Supports efficient processing of large sequence datasets. Use cases: sequence analysis, microbial ecology (16S rRNA analysis), metagenomics, phylogenetic analysis, and bioinformatics research requiring sequence manipulation and diversity calculations
59
59
-**TileDB-VCF** - High-performance C++ library with Python and CLI interfaces for efficient storage and retrieval of genomic variant-call data using TileDB multidimensional sparse array technology. Enables scalable VCF/BCF ingestion with incremental sample addition, compressed storage, parallel queries across genomic regions and samples, and export capabilities for population genomics workflows. Key features include: memory-efficient queries, cloud storage integration (S3, Azure, GCS), and CLI tools for dataset creation, sample ingestion, data export, and statistics. Supports building variant databases for large cohorts, population-scale genomics studies, and association analysis. Use cases: population genomics databases, cohort studies, variant discovery workflows, genomic data warehousing, and scaling to enterprise-level analysis with TileDB-Cloud platform
60
60
-**Zarr** - Python library (Zarr-Python 3.x) implementing chunked, compressed N-dimensional arrays for local disk and cloud object storage (S3, GCS via fsspec). Supports Zarr format 2 and 3, `zarr.codecs` compression (Blosc, gzip, zstd), partial chunk reads, consolidated metadata, sharding, and integration with NumPy, Dask, and Xarray. Use for out-of-core arrays, cloud-native pipelines, and large scientific datasets (genomics, imaging, climate). Skill: `zarr-python`
Copy file name to clipboardExpand all lines: skills/scvi-tools/SKILL.md
+16-4Lines changed: 16 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,15 +3,17 @@ name: scvi-tools
3
3
description: Deep generative models for single-cell omics. Use when you need probabilistic batch correction (scVI), transfer learning, differential expression with uncertainty, or multi-modal integration (TOTALVI, MultiVI). Best for advanced modeling, batch effects, multimodal data. For standard analysis pipelines use scanpy.
4
4
license: BSD-3-Clause license
5
5
metadata:
6
-
version: "1.0"
6
+
version: "1.1"
7
7
skill-author: K-Dense Inc.
8
8
---
9
9
10
10
# scvi-tools
11
11
12
12
## Overview
13
13
14
-
scvi-tools is a comprehensive Python framework for probabilistic models in single-cell genomics. Built on PyTorch and PyTorch Lightning, it provides deep generative models using variational inference for analyzing diverse single-cell data modalities.
14
+
scvi-tools is a comprehensive Python framework for probabilistic models in single-cell genomics. Built on PyTorch and PyTorch Lightning, it provides deep generative models using variational inference for analyzing diverse single-cell data modalities. Current stable release: **scvi-tools 1.4.3** (May 2026).
15
+
16
+
**Model namespaces matter:** core models (scVI, scANVI, totalVI, MultiVI, PeakVI, AUTOZI, CondSCVI, DestVI, LinearSCVI, AmortizedLDA, JaxSCVI) live under `scvi.model`. Most other models (VeloVI, contrastiveVI, CellAssign, PoissonVI, scBasset, MrVI, MethylVI/MethylANVI, CytoVI, SysVI, Decipher, gimVI, scVIVA, ResolVI, Stereoscope, Solo, totalANVI, DIAGVI) live under `scvi.external`. The reference files specify the correct namespace per model.
15
17
16
18
## When to Use This Skill
17
19
@@ -46,8 +48,10 @@ Models for analyzing single-cell chromatin data. See `references/models-atac-seq
46
48
### 3. Multimodal & Multi-omics Integration
47
49
Joint analysis of multiple data types. See `references/models-multimodal.md` for:
48
50
-**totalVI**: CITE-seq protein and RNA joint modeling
49
-
-**MultiVI**: Paired and unpaired multi-omic integration
51
+
-**totalANVI**: Semi-supervised CITE-seq (totalVI with cell-type labels)
52
+
-**MultiVI**: Paired and unpaired multi-omic integration (MuData-based)
50
53
-**MrVI**: Multi-resolution cross-sample analysis
54
+
-**DIAGVI**: Diagonal integration of unpaired single-cell datasets (added in 1.4.3)
51
55
52
56
### 4. Spatial Transcriptomics
53
57
Spatially-resolved transcriptomics analysis. See `references/models-spatial.md` for:
@@ -171,12 +175,20 @@ See `references/theoretical-foundations.md` for detailed background on the mathe
0 commit comments