Skip to content

Restore genetic ancestry #1265

Open
Open
@brianraymor

Description

@brianraymor

Pending

Data readiness to be assessed during 2025 Q2 planning to finalize the addition of this feature in schema 6.0.0.

Per @jahilton's suggestions:

  • Should clarify that these are generated by CZI's pipeline & include links to documentation
  • Update annotators to include Curator, Curator or Submitter, CELLxGENE Discover.

March 3 2025

A recent HANCESTRO release has added an " ancestry" suffix to some of the labels in the original schema draft such as "European ancestry". Editorial updates to the text below are dependent on updating the pinned HANCESTRO release in coordination with this issue. Currently, this is blocked by data corpus conflicts with Update self_reported_ethnicity_term_id.


Changelog

  • obs (Cell metadata)
    • Added genetic_ancestry_African
    • Added genetic_ancestry_East_Asian
    • Added genetic_ancestry_European
    • Added genetic_ancestry_Indigenous_American
    • Added genetic_ancestry_Oceanian
    • Added genetic_ancestry_South_Asian

Design

If organism_ontology_term_id is "NCBITaxon:9606" for Homo sapiens, then for each observation for the following fields, either all their values must be float("nan") or the sum of their values MUST be 1.0 ± 0.0002:

  • genetic_ancestry_African
  • genetic_ancestry_East_Asian
  • genetic_ancestry_European
  • genetic_ancestry_Indigenous_American
  • genetic_ancestry_Oceanian
  • genetic_ancestry_South_Asian

genetic_ancestry_African

Key genetic_ancestry_African
Annotator Curator MUST annotate.
Value float. All observations with the same donor_id MUST contain the same value.

If organism_ontology_term_id is NOT "NCBITaxon:9606" for Homo sapiens, then the value MUST be a float("nan").

If organism_ontology_term_id is "NCBITaxon:9606" for Homo sapiens, then the value MUST be a float("nan") if unavailable; otherwise, the value MUST be the genetic ancestry percentage of "HANCESTRO:0010" for African expressed as a float greater than or equal to 0.0 and less than or equal to 1.0

genetic_ancestry_East_Asian

Key genetic_ancestry_East_Asian
Annotator Curator MUST annotate.
Value float. All observations with the same donor_id MUST contain the same value.

If organism_ontology_term_id is NOT "NCBITaxon:9606" for Homo sapiens, then the value MUST be a float("nan").

If organism_ontology_term_id is "NCBITaxon:9606" for Homo sapiens, then the value MUST be a float("nan") if unavailable; otherwise, the value MUST be the genetic ancestry percentage of "HANCESTRO:0009" for East Asian expressed as a float greater than or equal to 0.0 and less than or equal to 1.0

genetic_ancestry_European

Key genetic_ancestry_European
Annotator Curator MUST annotate.
Value float. All observations with the same donor_id MUST contain the same value.

If organism_ontology_term_id is NOT "NCBITaxon:9606" for Homo sapiens, then the value MUST be a float("nan").

If organism_ontology_term_id is "NCBITaxon:9606" for Homo sapiens, then the value MUST be a float("nan") if unavailable; otherwise, the value MUST be the genetic ancestry percentage of "HANCESTRO:0005" for European expressed as a float greater than or equal to 0.0 and less than or equal to 1.0

genetic_ancestry_Indigenous_American

Key genetic_ancestry_Indigenous_American
Annotator Curator MUST annotate.
Value float. All observations with the same donor_id MUST contain the same value.

If organism_ontology_term_id is NOT "NCBITaxon:9606" for Homo sapiens, then the value MUST be a float("nan").

If organism_ontology_term_id is "NCBITaxon:9606" for Homo sapiens, then the value MUST be a float("nan") if unavailable; otherwise, the value MUST be the genetic ancestry percentage of "HANCESTRO:0013" for Indigenous American expressed as a float greater than or equal to 0.0 and less than or equal to 1.0

genetic_ancestry_Oceanian

Key genetic_ancestry_Oceanian
Annotator Curator MUST annotate.
Value float. All observations with the same donor_id MUST contain the same value.

If organism_ontology_term_id is NOT "NCBITaxon:9606" for Homo sapiens, then the value MUST be a float("nan").

If organism_ontology_term_id is "NCBITaxon:9606" for Homo sapiens, then the value MUST be a float("nan") if unavailable; otherwise, the value MUST be the genetic ancestry percentage of "HANCESTRO:0017" for Oceanian expressed as a float greater than or equal to 0.0 and less than or equal to 1.0

genetic_ancestry_South_Asian

Key genetic_ancestry_South_Asian
Annotator Curator MUST annotate.
Value float. All observations with the same donor_id MUST contain the same value.

If organism_ontology_term_id is NOT "NCBITaxon:9606" for Homo sapiens, then the value MUST be a float("nan").

If organism_ontology_term_id is "NCBITaxon:9606" for Homo sapiens, then the value MUST be a float("nan") if unavailable; otherwise, the value MUST be the genetic ancestry percentage of "HANCESTRO:0006" for South Asian expressed as a float greater than or equal to 0.0 and less than or equal to 1.0

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    6.0Next major CELLxGENE schema versionblockedschemaCELLxGENE Discover dataset schema

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions