Skip to content

Draft relaxed schema compliance #1025

Open
@brianraymor

Description

@brianraymor

Context

There are emerging requirements for reusing the cellxgene-schema CLIschema+validator for scenarios that are more relaxed than CELLxGENE Discover's current requirements.

Relaxation

The following sections blue-sky possible approaches to documenting relaxed requirements; however, the solution should be driven by concrete scenarios and not theory.

Fine Granularity: Per Schema variant

A limited number of schema variants could be documented such as the "cross modality schema". schema_reference could be reused for the curator to define the preferred schema for validation.


Fine Granularity: Per Metadata field

For each metadata field, the schema defines separate requirements for strict and relaxed. Generally, relaxed will indicate that the field MUST NOT be present, but it's also possible to relax other requirements.


uns (Dataset Metadata)

relaxed

Key relaxed
Annotator Curator MAY annotate.
Value list[str]. str values MUST match one or more of the values in the set:
  • "obs['cell_type_ontology_term_id']"
  • "obs['development_stage_ontology_term_id']"
  • ...

If present, relaxed validation MUST be performed on the specified metadata field.


Concrete example: If the assay is silver tier Visium Spatial Gene Expression then assuming that cell_type_ontology_term_id defined its relaxed validation as:

  1. cell_type_ontology_term_id MUST NOT be present in obs
  2. "cell_type_onotlogy_term_id" MUST be annotated in uns['relaxed']

Then the silver tier dataset would simply meet those requirements.


Coarse Granularity: Per Dataset

The schema documents a relaxed subset of the current required fields. This subset may not include cell_type_ontology_term_id or perhaps development_stage_ontology_term_id. If a current required field is not included in the relaxed subset, then it MUST NOT be present in the dataset.

Curators annotate whether strict or relaxed validation is desired.


uns (Dataset Metadata)

strict

Key strict
Annotator Curator MUST annotate.
Value bool. This MUST be True for strict validation and MUST be False for relaxed validation.

References

Compliance to the MiAIRR Data Standard is currently a binary state, i.e., a data either is or is not compliant, there are not “grades” of compliance. However, additional requirements for specific use cases might be defined in the future.

Metadata

Metadata

Assignees

Labels

discoverydraftingdrafting schema requirementsschemaCELLxGENE Discover dataset schema

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions