Description
Context
There are emerging requirements for reusing the cellxgene-schema CLI
schema+validator for scenarios that are more relaxed than CELLxGENE Discover's current requirements.
Relaxation
The following sections blue-sky possible approaches to documenting relaxed requirements; however, the solution should be driven by concrete scenarios and not theory.
Fine Granularity: Per Schema variant
A limited number of schema variants could be documented such as the "cross modality schema". schema_reference
could be reused for the curator to define the preferred schema for validation.
Fine Granularity: Per Metadata field
For each metadata field, the schema defines separate requirements for strict and relaxed. Generally, relaxed will indicate that the field MUST NOT be present, but it's also possible to relax other requirements.
uns
(Dataset Metadata)
relaxed
Key | relaxed |
---|---|
Annotator | Curator MAY annotate. |
Value |
list[str] . str values MUST match one or more of the values in the set:
If present, relaxed validation MUST be performed on the specified metadata field. |
Concrete example: If the assay is silver tier Visium Spatial Gene Expression then assuming that cell_type_ontology_term_id
defined its relaxed validation as:
cell_type_ontology_term_id
MUST NOT be present inobs
- "cell_type_onotlogy_term_id" MUST be annotated in
uns['relaxed']
Then the silver tier dataset would simply meet those requirements.
Coarse Granularity: Per Dataset
The schema documents a relaxed subset of the current required fields. This subset may not include cell_type_ontology_term_id
or perhaps development_stage_ontology_term_id
. If a current required field is not included in the relaxed subset, then it MUST NOT be present in the dataset.
Curators annotate whether strict or relaxed validation is desired.
uns
(Dataset Metadata)
strict
Key | strict |
---|---|
Annotator | Curator MUST annotate. |
Value | bool . This MUST be True for strict validation and MUST be False for relaxed validation. |
References
Compliance to the MiAIRR Data Standard is currently a binary state, i.e., a data either is or is not compliant, there are not “grades” of compliance. However, additional requirements for specific use cases might be defined in the future.