Skip to content

Commit 721576a

Browse files
authored
Merge pull request #120 from ncihtan/text_addtns
Phase 2 data model docs, sc/snRNAseq page
2 parents 60b98c6 + 5c965c4 commit 721576a

File tree

4 files changed

+260
-78
lines changed

4 files changed

+260
-78
lines changed

data_model/overview.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,8 @@ This manual and the HTAN Data Portal standards pages will be updated as the Phas
1717
!!!
1818

1919
## HTAN Phase 2 Data Model
20+
[!button text="HTAN Phase 2 Data Model Documentation"](https://htan2-data-model.readthedocs.io/en/latest/index.html)
21+
2022
[!button text="HTAN Phase 2 Data Model Github Repository"](https://github.com/ncihtan/htan2-data-model)
2123

2224
In HTAN Phase 2, the Data Model is being updated with three main aims:

data_model/standards.md

Lines changed: 4 additions & 78 deletions
Original file line numberDiff line numberDiff line change
@@ -8,82 +8,8 @@ HTAN Centers submit assay data files and [metadata](../data_submission/metadata.
88

99
![Example HTAN Metadata and Assay Data](../img/metadata.svg)
1010

11-
For **HTAN Phase 1**, [The HTAN Portal's Data Standards pages](https://humantumoratlas.org/standards) provide interactive, searchable and downloadable summaries of the metadata attributes, requirements and valid values expected for each data type.
12-
13-
!!! The following Table provides links to HTAN Phase 1 data standards. Additional documentation for Phase 2 Data Standards is currently in development. Please see [Data Model Introduction](../data_model/overview.md) for more information regarding the Phase 2 Data Model.
14-
!!!
15-
16-
## Phase 1 Table of Specific Standards
17-
18-
| Category | Specific Standard |
19-
|----------|-------------------|
20-
| Biospecimen | [Biospecimen](https://humantumoratlas.org/standard/biospecimen/biospecimen) |
21-
| Clinical | [Demographics](https://humantumoratlas.org/standard/clinical/demographics) |
22-
| Clinical | [Diagnosis](https://humantumoratlas.org/standard/clinical/diagnosis) |
23-
| Clinical | [Exposure](https://humantumoratlas.org/standard/clinical/exposure) |
24-
| Clinical | [Family History](https://humantumoratlas.org/standard/clinical/familyhistory) |
25-
| Clinical | [Follow Up](https://humantumoratlas.org/standard/clinical/followup) |
26-
| Clinical | [Molecular Test](https://humantumoratlas.org/standard/clinical/moleculartest) |
27-
| Clinical | [Participant Vital Status Update](https://humantumoratlas.org/standard/clinical/participantvitalstatusupdate) |
28-
| Clinical | [Therapy](https://humantumoratlas.org/standard/clinical/therapy) |
29-
| Clinical | [Clinical Data Tier 2](https://humantumoratlas.org/standard/clinical/clinicaldatatier2) |
30-
| Clinical | [Acute Lymphoblastic Leukemia Tier 3](https://humantumoratlas.org/standard/clinical/acutelymphoblasticleukemiatier3) |
31-
| Clinical | [Breast Cancer Tier 3](https://humantumoratlas.org/standard/clinical/breastcancertier3) |
32-
| Clinical | [Colorectal Cancer Tier 3](https://humantumoratlas.org/standard/clinical/colorectalcancertier3) |
33-
| Clinical | [Lung Cancer Tier 3](https://humantumoratlas.org/standard/clinical/lungcancertier3) |
34-
| Clinical | [Melanoma Tier 3](https://humantumoratlas.org/standard/clinical/melanomatier3) |
35-
| Clinical | [Ovarian Cancer Tier 3](https://humantumoratlas.org/standard/clinical/ovariancancertier3) |
36-
| Clinical | [Pancreatic Cancer Tier 3](https://humantumoratlas.org/standard/clinical/pancreaticcancertier3) |
37-
| Clinical | [Pancreatic Cancer Tier 3](https://humantumoratlas.org/standard/clinical/pancreaticcancertier3) |
38-
| Clinical | [Prostate Cancer Tier 3](https://humantumoratlas.org/standard/clinical/prostatecancertier3) |
39-
| Clinical | [Sarcoma Tier 3](https://humantumoratlas.org/standard/clinical/sarcomatier3) |
40-
| Imaging | [Imaging Level 1](https://humantumoratlas.org/standard/imaging/imaginglevel1) |
41-
| Imaging | [Imaging Level 2](https://humantumoratlas.org/standard/imaging/imaginglevel2) |
42-
| Imaging | [Imaging Level 3](https://humantumoratlas.org/standard/imaging/imaginglevel3) |
43-
| Imaging | [Imaging Level 4](https://humantumoratlas.org/standard/imaging/imaginglevel4) |
44-
| Mass Spectrometry | [Mass Spectrometry Level 1](https://humantumoratlas.org/standard/mass_spectrometry/massspectrometrylevel1) |
45-
| Mass Spectrometry | [Mass Spectrometry Level 2](https://humantumoratlas.org/standard/mass_spectrometry/massspectrometrylevel2) |
46-
| Mass Spectrometry | [Mass Spectrometry Level 3](https://humantumoratlas.org/standard/mass_spectrometry/massspectrometrylevel3) |
47-
| Mass Spectrometry | [Mass Spectrometry Level 4](https://humantumoratlas.org/standard/mass_spectrometry/massspectrometrylevel4) |
48-
| Mass Spectrometry | [Mass Spectrometry Auxiliary File](https://humantumoratlas.org/standard/mass_spectrometry/massspectrometryauxiliaryfile) |
49-
| Proteomics | [RPPA Level 2](https://humantumoratlas.org/standard/proteomics/rppalevel2) |
50-
| Proteomics | [RPPA Level 3](https://humantumoratlas.org/standard/proteomics/rppalevel3) |
51-
| Proteomics | [RPPA Level 4](https://humantumoratlas.org/standard/proteomics/rppalevel4) |
52-
| Sequencing | [Bulk DNA Level 1](https://humantumoratlas.org/standard/sequencing/bulkdnalevel1) |
53-
| Sequencing | [Bulk DNA Level 2](https://humantumoratlas.org/standard/sequencing/bulkdnalevel2) |
54-
| Sequencing | [Bulk DNA Level 3](https://humantumoratlas.org/standard/sequencing/bulkdnalevel3) |
55-
| Sequencing | [Bulk Methylation-seq Level 1](https://humantumoratlas.org/standard/sequencing/bulkmethylation-seqlevel1) |
56-
| Sequencing | [Bulk Methylation-seq Level 2](https://humantumoratlas.org/standard/sequencing/bulkmethylation-seqlevel2) |
57-
| Sequencing | [Bulk Methylation-seq Level 3](https://humantumoratlas.org/standard/sequencing/bulkmethylation-seqlevel3) |
58-
| Sequencing | [Bulk RNA-seq Level 1](https://humantumoratlas.org/standard/sequencing/bulkrna-seqlevel1) |
59-
| Sequencing | [Bulk RNA-seq Level 2](https://humantumoratlas.org/standard/sequencing/bulkrna-seqlevel2) |
60-
| Sequencing | [Bulk RNA-seq Level 3](https://humantumoratlas.org/standard/sequencing/bulkrna-seqlevel3) |
61-
| Sequencing | [Hi-C-seq Level 1](https://humantumoratlas.org/standard/sequencing/hi-c-seqlevel1) |
62-
| Sequencing | [Hi-C-seq Level 2](https://humantumoratlas.org/standard/sequencing/hi-c-seqlevel2) |
63-
| Sequencing | [Hi-C-seq Level 3](https://humantumoratlas.org/standard/sequencing/hi-c-seqlevel3) |
64-
| Sequencing | [scATAC-seq Level 1](https://humantumoratlas.org/standard/sequencing/scatac-seqlevel1) |
65-
| Sequencing | [scATAC-seq Level 2](https://humantumoratlas.org/standard/sequencing/scatac-seqlevel2) |
66-
| Sequencing | [scATAC-seq Level 3](https://humantumoratlas.org/standard/sequencing/scatac-seqlevel3) |
67-
| Sequencing | [scATAC-seq Level 3](https://humantumoratlas.org/standard/sequencing/scatac-seqlevel4) |
68-
| Sequencing | [scDNA-seq Level 1](https://humantumoratlas.org/standard/sequencing/scdna-seqlevel1) |
69-
| Sequencing | [scDNA-seq Level 2](https://humantumoratlas.org/standard/sequencing/scdna-seqlevel2) |
70-
| Sequencing | [scmC-seq Level 1](https://humantumoratlas.org/standard/sequencing/scmc-seqlevel1) |
71-
| Sequencing | [scmC-seq Level 2](https://humantumoratlas.org/standard/sequencing/scmc-seqlevel2) |
72-
| Sequencing | [scRNA-seq Level 1](https://humantumoratlas.org/standard/sequencing/scrna-seqlevel1) |
73-
| Sequencing | [scRNA-seq Level 2](https://humantumoratlas.org/standard/sequencing/scrna-seqlevel2) |
74-
| Sequencing | [scRNA-seq Level 3](https://humantumoratlas.org/standard/sequencing/scrna-seqlevel3) |
75-
| Sequencing | [scRNA-seq Level 4](https://humantumoratlas.org/standard/sequencing/scrna-seqlevel4) |
76-
| Spatial Transcriptomics | [10X Genomics Xenium ISS Experiment](https://humantumoratlas.org/standard/spatial_transcriptomics/10xgenomicsxeniumissexperiment) |
77-
| Spatial Transcriptomics | [10x Visium Spatial Transcriptomics - RNA-seq Level 1](https://humantumoratlas.org/standard/spatial_transcriptomics/10xvisiumspatialtranscriptomics-rna-seqlevel1) |
78-
| Spatial Transcriptomics | [10x Visium Spatial Transcriptomics - RNA-seq Level 2](https://humantumoratlas.org/standard/spatial_transcriptomics/10xvisiumspatialtranscriptomics-rna-seqlevel2) |
79-
| Spatial Transcriptomics | [10x Visium Spatial Transcriptomics - RNA-seq Level 3](https://humantumoratlas.org/standard/spatial_transcriptomics/10xvisiumspatialtranscriptomics-rna-seqlevel3) |
80-
| Spatial Transcriptomics | [10x Visium Spatial Transcriptomics - Auxiliary Files](https://humantumoratlas.org/standard/spatial_transcriptomics/10xvisiumspatialtranscriptomics-auxiliaryfiles) |
81-
| Spatial Transcriptomics | [Nanostring CosMx SMI Experiment](https://humantumoratlas.org/standard/spatial_transcriptomics/nanostringcosmxsmiexperiment) |
82-
| Spatial Transcriptomics | [NanoString GeoMx DSP ROI DCC Segment Annotation Metadata](https://humantumoratlas.org/standard/spatial_transcriptomics/nanostringgeomxdsproidccsegmentannotationmetadata) |
83-
| Spatial Transcriptomics | [NanoString GeoMx DSP ROI RCC Segment Annotation Metadata](https://humantumoratlas.org/standard/spatial_transcriptomics/nanostringgeomxdsproirccsegmentannotationmetadata) |
84-
| Spatial Transcriptomics | [NanoString GeoMx DSP Spatial Transcriptomics Level 1](https://humantumoratlas.org/standard/spatial_transcriptomics/nanostringgeomxdspspatialtranscriptomicslevel1) |
85-
| Spatial Transcriptomics | [NanoString GeoMx DSP Spatial Transcriptomics Level 3](https://humantumoratlas.org/standard/spatial_transcriptomics/nanostringgeomxdspspatialtranscriptomicslevel3) |
86-
| Spatial Transcriptomics | [Slide-seq Level 1](https://humantumoratlas.org/standard/spatial_transcriptomics/slide-seqlevel1) |
87-
| Spatial Transcriptomics | [Slide-seq Level 2](https://humantumoratlas.org/standard/spatial_transcriptomics/slide-seqlevel2) |
88-
| Spatial Transcriptomics | [Slide-seq Level 3](https://humantumoratlas.org/standard/spatial_transcriptomics/slide-seqlevel3) |
11+
For **HTAN Phase 2**, there are both metadata standards as well as specific file requirements.
12+
- Interactive, searchable and downloadable summaries of metadata requirements are provided [here](https://htan2-data-model.readthedocs.io/en/latest/index.html).
13+
- Specific file requirements for single cell RNA-seq h5ad files are modeled after [CELLxGENE's requirements](https://cellxgene.cziscience.com/docs/032__Contribute%20and%20Publish%20Data). Please see the [Phase 2 Single Cell RNA-seq page](../data_submission/scrnaseq_data_submission.md) for more information.
8914

15+
For **HTAN Phase 1**, [The HTAN Portal's Data Standards pages](https://humantumoratlas.org/standards) provide interactive, searchable and downloadable summaries of the metadata attributes, requirements and valid values expected for each data type.
Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
---
2+
order: 992
3+
---
4+
5+
# Phase 2 sc/snRNA-seq Data Submission
6+
7+
## Overview
8+
In HTAN Phase 2, the following files are submitted for single cell/single nuclei RNA-sequencing (sc/snRNA-seq) data:
9+
10+
| Level | Data Type | Example Files |
11+
|---|-------------------|----------------------|
12+
| 1 | raw sequence data | fastq, unaligned bam |
13+
| 2 | aligned sequence data | bam |
14+
| 3_4 | sample level summary information, e.g. cell annotations, t-SNE/UMAP coordinates, etc. | h5ad |
15+
16+
Metadata requirements are documented in the HTAN Data Model [readthedocs](https://htan2-data-model.readthedocs.io/en/latest/docs/scrna-seq.html) pages. This part of the manual describes **file requirements** for level 3_4 h5ad files.
17+
18+
## HTAN's h5ad Requirements
19+
HTAN Centers are encouraged to reference the [sc/snRNA-seq RFC](https://docs.google.com/document/d/1XjDLWulYWhnfZrGCg-0_Jh93ytIp3p_01ZrTyymTjoU/edit?usp=sharing) for additional details. The HTAN h5ad (AnnData 0.10) requirements are modeled after CELLxGENE's requirements. They also include three attributes developed by the Human Cell Atlas (HCA). Please see the [Background](#background-h5ad-files-cellxgene-human-cell-atlas) section below for more information about h5ad (AnnData 0.10) files, CELLxGENE and the HCA.
20+
21+
### Required File Attributes
22+
Similar to CELLxGENE's [Dataset Requirements](https://cellxgene.cziscience.com/docs/032__Contribute%20and%20Publish%20Data), level 3_4 sc/snRNA-seq h5ad files must contain the following attributes. Please see [HTAN_h5ad_exemplar_2025_03_03.h5ad](https://github.com/ncihtan/h5ad/blob/main/exemplars/HTAN_h5ad_exemplar_2025_03_03.h5ad) for an example file which meets these requirements.
23+
24+
| Attribute | Ontology/Version | Comments/Examples |
25+
|-----------|------------------|-------------------|
26+
| var.index, raw.var.index | [Human reference GRCh38.p14 (GENCODE v44/Ensembl 110)](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/5.2.0/schema.md#required-gene-annotations) | ENSEMBL IDs. For example: ENSG00000107566.|
27+
| var.gene_is_filtered, raw.var.gene_id_filtered | | no genes filtered in raw data; if gene is filtered in normalized data, count is set to 0 and gene_is_filtered set to 1.|
28+
| obs.organism_ontology_term_id | [NCBITaxon](https://www.ncbi.nlm.nih.gov/taxonomy) | Set to NCBITaxon:9606 for human. |
29+
| obs.donor_id | | Set to the HTAN Participant ID, e.g. HTA201_1.|
30+
| obs.sample_id | | Set to the HTAN Biospecimen ID, e.g. HTA201_1_B. |
31+
| obs.development_stage_ontology_term_id | [Human Development Stages (HsapDv)](https://www.ebi.ac.uk/ols4/ontologies/HsapDv) | use [HCA recommended terms](https://docs.google.com/document/d/1SsHZweG_kqerCAPNbQF7gQHNBDRqOsNZWzaWXZIKwTE/edit?usp=sharing) (p.22) |
32+
| obs.sex_ontology_term_id | [Phenotype and Trait Ontology (PATO)](https://www.ebi.ac.uk/ols4/ontologies/pato) | Use [CELLxGENE Requirements](https://cellxgene.cziscience.com/docs/032__Contribute%20and%20Publish%20Data#dataset-requirements) PATO:0000384 for male, PATO:0000383 for female, or unknown if unavailable. |
33+
| obs.self_reported_ethnicity_term_id | [Human Ancestry Ontology (HANCESTRO)](https://www.ebi.ac.uk/ols4/ontologies/hancestro) | Use [CELLxGENE Requirements](https://cellxgene.cziscience.com/docs/032__Contribute%20and%20Publish%20Data#dataset-requirements). HANCESTRO multiple comma-separated terms may be used if more than one ethnicity is reported. If information is unavailable, use unknown. Example: HANCESTRO_0568. Note that CELLxGENE specifically excludes certain HANCESTRO categories. See [full details](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/5.2.0/schema.md#self_reported_ethnicity_ontology_term_id).|
34+
| obs.disease_ontology_term_id | [Mondo Disease Ontology](https://www.ebi.ac.uk/ols4/ontologies/mondo) | |
35+
| obs.tissue_type | CELLxGENE | Use [CELLxGENE Requirements](https://cellxgene.cziscience.com/docs/032__Contribute%20and%20Publish%20Data#dataset-requirements), Permitted values are restricted to: tissue, organoid, or cell culture.|
36+
| obs.tissue_ontology_term_id | [Uber Anatomy Ontology (UBERON)](https://www.ebi.ac.uk/ols4/ontologies/uberon) ||
37+
| obs.cell_type_ontology_term_id | [Cell Ontology (CL)](https://www.ebi.ac.uk/ols4/ontologies/cl) | |
38+
| obs.assay_ontology_term_id | [Experimental Factor Ontology (EFO)](https://www.ebi.ac.uk/ols4/ontologies/efo) | Use [CELLxGENE Requirements](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/5.2.0/schema.md#assay_ontology_term_id) |
39+
| obs.suspension_type | CELLxGENE | Use [CELLxGENE Requirements](https://cellxgene.cziscience.com/docs/032__Contribute%20and%20Publish%20Data#dataset-requirements). Permitted values are restricted to: cell, nucleus, na. |
40+
| obs.is_primary_data | CELLxGENE. Used to indicate if this is the canonical data set (True), or data is being reused from another source (False). | Use [CELLxGENE Requirements](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/5.2.0/schema.md#is_primary_data). Permitted values are restricted to True or False. |
41+
| obs.cell_enrichment | Human Cell Atlas: “Specifies the cell types targeted for enrichment or depletion beyond the selection of live cells.“ | CL term, followed by + or -. If no enrichment. Then use CL:00000000. For example, enrichment for fibroblasts would be CL:0000057+ |
42+
| obs.intron_inclusion | Human Cell Atlas: “Were introns included during read counting in the alignment process?” | Permitted values are: yes, no |
43+
| obs.author_cell_type | Human Cell Atlas: “Encoding of author intuition of cellular annotation in the dataset.” | Free text |
44+
| [obsm.X_(suffix)](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/5.2.0/schema.md#x_suffix) | CELLxGENE: embeddings of at least two dimensions, e.g. tSNE, UMAP, PCA, spatial coordinates | use [CELLxGENE terms](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/5.2.0/schema.md#x_suffix) for suffix (e.g. umap, tsne, pca) |
45+
46+
### HTAN h5ad File Validation
47+
The HTAN Data Coordinating Center (DCC) has released a PyPi package called [HTAN-h5ad-validator](https://pypi.org/project/HTAN-h5ad-validate/) with which Centers can validate their sc/snRNA-seq h5ad files. Sage Bionetworks will run the validator on sc/snRNA-seq h5ad files submitted to Synapse.
48+
49+
## Background: h5ad files, CELLxGENE, Human Cell Atlas
50+
51+
### h5ad (AnnData 0.10) brief overview
52+
Please see [AnnData’s documentation](https://anndata.readthedocs.io/en/latest/index.html) for a more detailed description of the AnnData object.
53+
54+
![From https://raw.githubusercontent.com/scverse/anndata/main/docs/_static/img/anndata_schema.svg](../img/anndata_schema.svg)
55+
56+
For HTAN’s purposes, the following parts of the AnnData object are of interest:
57+
58+
* .X - a matrix with counts where rows are cells and columns are genes.
59+
* var - a matrix with gene information (e.g. gene name, gene_is_filtered).
60+
* obs - a matrix with cell-level information.
61+
* obsm - one or more numpy ndarrays with cell embeddings.
62+
63+
CELLxGENE requires that raw data are submitted. Normalized data may also be submitted.
64+
65+
### CELLxGENE
66+
The HTAN DCC submits sc/snRNA-seq data to [CELLxGENE](https://cellxgene.cziscience.com), a tool developed by the Chan Zuckerberg Initiative (CZI) to visualize and explore single cell and spatial data. The DCC submits data to CellxGene in h5ad (AnnData 0.10) format. CELLxGENE’s schema requires:
67+
68+
* use of Ensembl gene IDs.
69+
* a specific genome reference and annotation version.
70+
* specific h5ad (AnnData 0.10) attributes.
71+
* use of specific ontologies for many of the required attributes (i.e. cell ontology).
72+
73+
The HTAN requirements for h5ad files are modeled after CELLxGENE's [Dataset Requirements](https://cellxgene.cziscience.com/docs/032__Contribute%20and%20Publish%20Data).
74+
75+
### Human Cell Atlas (HCA)
76+
77+
The Human Cell Atlas (HCA) is a large repository of single cell data from healthy subjects. It provides [standards for single-cell data submission](https://docs.google.com/document/d/1SsHZweG_kqerCAPNbQF7gQHNBDRqOsNZWzaWXZIKwTE/edit?usp=sharing) which adopt most of the CELLxGENE schema, but also include additional fields. Aligning HTAN data with CELLxGENE will potentially facilitate data integration with other consortia such as the HCA. The HTAN requirements include three HCA attributes in addition to CELLxGENE required attributes.
78+
79+

0 commit comments

Comments
 (0)