K-Dense-AI
diff --git a/‎docs/skills.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/skills.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎skills/scvi-tools/SKILL.md‎
Lines changed: 16 additions & 4 deletions b/‎skills/scvi-tools/SKILL.md‎
Lines changed: 16 additions & 4 deletions
diff --git a/‎skills/scvi-tools/references/differential-expression.md‎
Lines changed: 16 additions & 0 deletions b/‎skills/scvi-tools/references/differential-expression.md‎
Lines changed: 16 additions & 0 deletions
diff --git a/‎skills/scvi-tools/references/models-atac-seq.md‎
Lines changed: 36 additions & 28 deletions b/‎skills/scvi-tools/references/models-atac-seq.md‎
Lines changed: 36 additions & 28 deletions
@@ -54,7 +54,7 @@
 - **Pathway Enrichment** - Pathway and gene-set enrichment analysis on gene lists or ranked gene data, with result interpretation. Supports over-representation analysis (ORA via Enrichr/Fisher's exact/hypergeometric), preranked and standard Gene Set Enrichment Analysis (GSEA), and single-sample scoring (ssGSEA/GSVA) using gseapy and the official g:Profiler client. Covers gene-set libraries (GO Biological Process/Molecular Function/Cellular Component, KEGG, Reactome, WikiPathways, and MSigDB collections including Hallmark, C2 canonical pathways, C5 ontology, and C7 immune signatures), gene-ID mapping (Ensembl/Entrez to symbols via Biomart, g:Convert, or mygene) and organism handling, choice of the statistical background universe, multiple-testing correction (Benjamini-Hochberg FDR vs g:Profiler g:SCS vs Bonferroni), redundancy reduction (enrichment maps, leading-edge genes, term clustering), and publication-ready tables plus dotplots/bar plots/GSEA running-score plots. Includes a CLI helper (run_enrichment.py) that runs ORA or preranked GSEA end-to-end — automatically building the ranking metric from a DESeq2 results table — and writes a results table and dotplot. Cross-references PyDESeq2 and Scanpy upstream (sources of differentially expressed genes and cluster markers) and database-lookup/gget for gene-ID mapping and Reactome/KEGG/STRING APIs. Use cases: functional interpretation of differentially expressed genes, CRISPR-screen hits, and single-cell cluster markers; GO/KEGG/Reactome/WikiPathways enrichment; preranked GSEA from DESeq2 statistics; pathway activity scoring per sample or cell; and building defensible, reproducible enrichment analyses that avoid common pitfalls (gene-ID/organism mismatch, wrong background, thresholding before GSEA)
 - **Scanpy** - Comprehensive Python toolkit for single-cell RNA-seq data analysis built on AnnData (scanpy 1.12.x; Python 3.12+). Provides end-to-end workflows for preprocessing (quality control, scrublet doublet detection, normalization, log transformation), dimensionality reduction (PCA, UMAP, t-SNE), Leiden clustering, marker gene identification, pseudobulk aggregation via `get.aggregate()`, trajectory inference (PAGA, diffusion maps), and visualization. Key features include: efficient handling of large datasets using sparse matrices and experimental Dask out-of-core support, integration with scvi-tools for advanced analysis, batch correction methods (ComBat), and publication-quality plotting. Optional GPU acceleration via rapids-singlecell. Use cases: single-cell RNA-seq analysis, cell-type identification, exploratory cluster markers, pseudobulk DE workflows (with pydeseq2), trajectory analysis, and comprehensive single-cell genomics workflows
 - **scVelo** - RNA velocity analysis for estimating cell state transitions from unspliced/spliced mRNA dynamics. Infers trajectory directions, computes latent time, and identifies driver genes in single-cell RNA-seq data. Complements Scanpy/scVI-tools for trajectory inference, enabling the study of cellular differentiation dynamics and lineage decisions at single-cell resolution
-- **scvi-tools** - Probabilistic deep learning models for single-cell omics analysis. PyTorch-based framework providing variational autoencoders (VAEs) for dimensionality reduction, batch correction, differential expression, and data integration across modalities. Includes 25+ models: scVI/scANVI (RNA-seq integration and cell type annotation), totalVI (CITE-seq protein+RNA), MultiVI (multiome RNA+ATAC integration), PeakVI (ATAC-seq analysis), DestVI/Stereoscope/Tangram (spatial transcriptomics deconvolution), MethylVI (methylation), CytoVI (flow/mass cytometry), VeloVI (RNA velocity), contrastiveVI (perturbation studies), and Solo (doublet detection). Supports seamless integration with Scanpy/AnnData ecosystem, GPU acceleration, reference mapping (scArches), and probabilistic differential expression with uncertainty quantification
+- **scvi-tools** - Probabilistic deep learning models for single-cell omics analysis. PyTorch-based framework providing variational autoencoders (VAEs) for dimensionality reduction, batch correction, differential expression, and data integration across modalities. Includes 30+ models: scVI/scANVI (RNA-seq integration and cell type annotation), totalVI/totalANVI (CITE-seq protein+RNA), MultiVI (multiome RNA+ATAC integration), PeakVI (ATAC-seq analysis), DestVI/Stereoscope/Tangram (spatial transcriptomics deconvolution), MethylVI (methylation), CytoVI (flow/mass cytometry), VeloVI (RNA velocity), contrastiveVI (perturbation studies), and Solo (doublet detection). Supports seamless integration with Scanpy/AnnData ecosystem, GPU acceleration, reference mapping (scArches), and probabilistic differential expression with uncertainty quantification
 - **scikit-bio** - Python library for bioinformatics providing data structures, algorithms, and parsers for biological sequence analysis. Built on NumPy, SciPy, and pandas. Key features include: sequence objects (DNA, RNA, protein sequences) with biological alphabet validation, sequence alignment algorithms (local, global, semiglobal), phylogenetic tree manipulation, diversity metrics (alpha diversity, beta diversity, phylogenetic diversity), distance metrics for sequences and communities, file format parsers (FASTA, FASTQ, QIIME formats, Newick), and statistical analysis tools. Provides scikit-learn compatible transformers for machine learning workflows. Supports efficient processing of large sequence datasets. Use cases: sequence analysis, microbial ecology (16S rRNA analysis), metagenomics, phylogenetic analysis, and bioinformatics research requiring sequence manipulation and diversity calculations
 - **TileDB-VCF** - High-performance C++ library with Python and CLI interfaces for efficient storage and retrieval of genomic variant-call data using TileDB multidimensional sparse array technology. Enables scalable VCF/BCF ingestion with incremental sample addition, compressed storage, parallel queries across genomic regions and samples, and export capabilities for population genomics workflows. Key features include: memory-efficient queries, cloud storage integration (S3, Azure, GCS), and CLI tools for dataset creation, sample ingestion, data export, and statistics. Supports building variant databases for large cohorts, population-scale genomics studies, and association analysis. Use cases: population genomics databases, cohort studies, variant discovery workflows, genomic data warehousing, and scaling to enterprise-level analysis with TileDB-Cloud platform
 - **Zarr** - Python library (Zarr-Python 3.x) implementing chunked, compressed N-dimensional arrays for local disk and cloud object storage (S3, GCS via fsspec). Supports Zarr format 2 and 3, `zarr.codecs` compression (Blosc, gzip, zstd), partial chunk reads, consolidated metadata, sharding, and integration with NumPy, Dask, and Xarray. Use for out-of-core arrays, cloud-native pipelines, and large scientific datasets (genomics, imaging, climate). Skill: `zarr-python`
 
@@ -3,15 +3,17 @@ name: scvi-tools
 description: Deep generative models for single-cell omics. Use when you need probabilistic batch correction (scVI), transfer learning, differential expression with uncertainty, or multi-modal integration (TOTALVI, MultiVI). Best for advanced modeling, batch effects, multimodal data. For standard analysis pipelines use scanpy.
 license: BSD-3-Clause license
 metadata:
-  version: "1.0"
+  version: "1.1"
   skill-author: K-Dense Inc.
 ---
 
 # scvi-tools
 
 ## Overview
 
-scvi-tools is a comprehensive Python framework for probabilistic models in single-cell genomics. Built on PyTorch and PyTorch Lightning, it provides deep generative models using variational inference for analyzing diverse single-cell data modalities.
+scvi-tools is a comprehensive Python framework for probabilistic models in single-cell genomics. Built on PyTorch and PyTorch Lightning, it provides deep generative models using variational inference for analyzing diverse single-cell data modalities. Current stable release: **scvi-tools 1.4.3** (May 2026).
+
+**Model namespaces matter:** core models (scVI, scANVI, totalVI, MultiVI, PeakVI, AUTOZI, CondSCVI, DestVI, LinearSCVI, AmortizedLDA, JaxSCVI) live under `scvi.model`. Most other models (VeloVI, contrastiveVI, CellAssign, PoissonVI, scBasset, MrVI, MethylVI/MethylANVI, CytoVI, SysVI, Decipher, gimVI, scVIVA, ResolVI, Stereoscope, Solo, totalANVI, DIAGVI) live under `scvi.external`. The reference files specify the correct namespace per model.
 
 ## When to Use This Skill
 
@@ -46,8 +48,10 @@ Models for analyzing single-cell chromatin data. See `references/models-atac-seq
 ### 3. Multimodal & Multi-omics Integration
 Joint analysis of multiple data types. See `references/models-multimodal.md` for:
 - **totalVI**: CITE-seq protein and RNA joint modeling
-- **MultiVI**: Paired and unpaired multi-omic integration
+- **totalANVI**: Semi-supervised CITE-seq (totalVI with cell-type labels)
+- **MultiVI**: Paired and unpaired multi-omic integration (MuData-based)
 - **MrVI**: Multi-resolution cross-sample analysis
+- **DIAGVI**: Diagonal integration of unpaired single-cell datasets (added in 1.4.3)
 
 ### 4. Spatial Transcriptomics
 Spatially-resolved transcriptomics analysis. See `references/models-spatial.md` for:
@@ -171,12 +175,20 @@ See `references/theoretical-foundations.md` for detailed background on the mathe
 
 ## Installation
 
+Requires Python **3.12+** (scvi-tools 1.4 dropped older versions).
+
 ```bash
 uv pip install scvi-tools
 # For GPU support
-uv pip install scvi-tools[cuda]
+uv pip install "scvi-tools[cuda]"
 ```
 
+For reproducible environments, pin a version: `uv pip install scvi-tools==1.4.3`.
+
+**Compute backends:** training defaults to PyTorch (CPU/GPU/TPU). A JAX backend
+(`scvi.model.JaxSCVI`) and an experimental MLX backend for Apple silicon
+(`scvi.model.mlxSCVI`) are available for select models.
+
 ## Best Practices
 
 1. **Use raw counts**: Always provide unnormalized count data to models
 
@@ -245,6 +245,22 @@ large_effect = de_results[
 
 ## Advanced Usage
 
+### Differential Abundance
+
+In addition to differential *expression*, models exposing the `VAEMixin` API
+provide `differential_abundance()` and `get_aggregated_posterior()` (added in
+v1.4.2) to test how cell-state abundance shifts between conditions in the
+learned latent space:
+
+```python
+# Compare the latent-space abundance of two conditions
+da = model.differential_abundance(
+    groupby="condition",
+    group1="disease",
+    group2="healthy",
+)
+```
+
 ### DE Within Specific Cells
 
 ```python
 
@@ -95,19 +95,19 @@ da_results = model.differential_accessibility(
 - Fragment count matrix (cells × genomic regions)
 - Count data (not binary)
 
-**Basic Usage**:
+**Basic Usage** (PoissonVI lives in `scvi.external`):
 ```python
-scvi.model.POISSONVI.setup_anndata(
+scvi.external.POISSONVI.setup_anndata(
     adata,
     batch_key="batch"
 )
 
-model = scvi.model.POISSONVI(adata)
+model = scvi.external.POISSONVI(adata)
 model.train()
 
 # Get results
 latent = model.get_latent_representation()
-accessibility = model.get_accessibility_estimates()
+accessibility = model.get_normalized_accessibility()
 ```
 
 **Key Differences from PeakVI**:
@@ -143,27 +143,24 @@ accessibility = model.get_accessibility_estimates()
 - Peak accessibility matrix
 - Genome reference (for sequence extraction)
 
-**Basic Usage**:
+**Basic Usage** (scBasset lives in `scvi.external`):
 ```python
-# scBasset requires sequence information
-# First, extract sequences for peaks
-from scbasset import utils
-sequences = utils.fetch_sequences(adata, genome="hg38")
-
-# Setup and train
-scvi.model.SCBASSET.setup_anndata(
+# scBasset needs per-peak DNA sequences. Add them to the AnnData first;
+# this downloads the genome (once) and stores one-hot codes in adata.varm.
+scvi.data.add_dna_sequence(
     adata,
-    batch_key="batch"
+    genome_name="hg38",
+    install_genome=True,
 )
 
-model = scvi.model.SCBASSET(adata, sequences=sequences)
+# Register the per-peak sequence code, then train
+scvi.external.SCBASSET.setup_anndata(adata, dna_code_key="dna_code")
+
+model = scvi.external.SCBASSET(adata)
 model.train()
 
-# Get latent representation
+# Cell embeddings (low-dimensional latent representation)
 latent = model.get_latent_representation()
-
-# Interpret model: which sequences/motifs are important
-importance_scores = model.get_feature_importance()
 ```
 
 **Key Parameters**:
@@ -179,12 +176,16 @@ importance_scores = model.get_feature_importance()
 - **Transfer learning**: Fine-tune on new datasets
 
 **Interpretability Tools**:
-```python
-# Get importance scores for sequences
-importance = model.get_sequence_importance(region_indices=[0, 1, 2])
 
-# Predict accessibility for new sequences
-predictions = model.predict_accessibility(new_sequences)
+scBasset learns sequence-aware cell and peak embeddings. Transcription-factor
+activity is assessed by scoring motif sequences against the trained model rather
+than calling a single importance function. See the
+[scBasset user guide](https://docs.scvi-tools.org/en/stable/user_guide/models/scbasset.html)
+for the current motif-injection / TF-activity workflow.
+
+```python
+# Cell embeddings for clustering / visualization
+cell_embedding = model.get_latent_representation()
 ```
 
 ## Model Selection for ATAC-seq
@@ -275,14 +276,21 @@ model.save("peakvi_model")
 For paired multimodal data (RNA+ATAC from same cells), use **MultiVI** instead:
 
 ```python
-# For 10x Multiome or similar paired data
-scvi.model.MULTIVI.setup_anndata(
-    adata,
+from mudata import MuData
+
+# MultiVI is configured from a MuData object (setup_anndata was removed in v1.3)
+mdata = MuData({"rna": rna_adata, "atac": atac_adata})
+scvi.model.MULTIVI.setup_mudata(
+    mdata,
     batch_key="sample",
-    modality_key="modality"  # "RNA" or "ATAC"
+    modalities={"rna_layer": "rna", "atac_layer": "atac"},
 )
 
-model = scvi.model.MULTIVI(adata)
+model = scvi.model.MULTIVI(
+    mdata,
+    n_genes=rna_adata.n_vars,
+    n_regions=atac_adata.n_vars,
+)
 model.train()
 
 # Get joint latent space