GA4GH Experiments Metadata Standard

Purpose of the working group

Our main objective is to specify the minimum information needed to characterise a genomic experiment.

When a researcher downloads a genomic dataset, they typically get CRAM or VCF documents, which are the results of a sequencing experiment. However, these files contain little information on the nature of the experiment itself: are the data from whole genome sequencing, transcriptomics, or another kind of experiment? Are the data for a bulk sequencing or single cell assay? Have techniques been applied to target specific regions of the genome?

Without metadata explaining the context, researchers cannot make sense of results from experiments in genomics, epigenomics, and more. The GA4GH Discovery Work Stream is aiming to produce a minimal checklist of metadata needed to characterise -omics datasets. The Experiments Metadata Standard will provide a dictionary of properties that makes it easier to search for experiments and to understand their results for analysis.

For more information on our group, please visit our GA4GH web page.

Scope

While the term “metadata” can be very broad (data that describes data), this Discovery Workstream subgroup exclusively focuses on the properties of the methodology and equipment used in a genomic experiment, and more precisely on library preparation and instrument run. It provides context around the preparation of biological samples into libraries for a given laboratory experiment run, and the execution context for that run. Interoperability with other GA4GH standards will be key to the adoption of the standard.

In the first phase, the group will focus exclusively on genomic sequencing instruments generating reads (high-throughput sequencing experiments, such as WGS, RNA-Seq, and Methyl-Seq). Future specification updates may consider the inclusion of other instruments, quality control metrics and -omics data, such as genotyping arrays, proteomics, and metabolomics, based on the evolving needs within the genomics community. Follow this link to our current working document.

The following topics are therefore considered out of scope (and will remain so): clinical data, biological sample descriptors, downstream data processing, and analysis. The discussions revolve around the content of the checklist, rather than the formats, leaving the latter to the DaMaSC sub-working group.

How to use this checklist

Implementing the checklist in your new resource

If you are creating a new resource (dataset / project / platform) and would like to implement this checklist, we suggest having a look at both the "core" and "identifiers" sections, and consider how each property could apply and be inserted in your data model. For any question on specific properties, we can provide help if you open an issue in this GitHub.

Map your existing resource to the checklist

Please have a look at our mappings section.

Core Properties Checklist

Two documents are being presented for this first version of the checklist:

Core: This checklist contains properties that are relevant to any sequencing assay.
Identifiers: This checklist contains identifiers that are relevant to include with a genomic dataset.

Mappings / Implementations

The Mappings section provides a mapping of existing platforms and projects to the GA4GH Experiments Metadata Checklist.

Future plans

While the current checklist represents the first version of the standard, the GA4GH Experiments Metadata Standard group is actively planning enhancements for future releases. These will include:

Categories: A key upcoming milestone is to define further properties specific to various genomic sequencing domains, such as Transcriptomics, Single-Cell Sequencing, Methylation, and Targeted Sequencing. Progress in each category will depend on the level of engagement from the respective communities to help shape and validate these specific properties.
Ontologies: As we suggest ontologies, guided by GA4GH TASC recommendations, we aim to cover the necessary terms to describe each concept, where appropriate. Initial work has focused on instrument-related terms using OBI and GENEPIO, and this effort will continue for other properties.
Schema: Providing an optional schema that implementers can adopt to support the checklist, without making its use mandatory.
Involvement with Beacon: Enabling GA4GH Beacon searches on terms covered by the checklist. This will enable, for instance, to query a Beacon node for all RNA-Seq experiments or data that was sequenced using a specific platform.
Supporting data generators and repositories: We are actively supporting implementations of the standard in wider data ecosytems that help to generate, store and discover genomics data.
Supporting data processing: We plan on supporting data tools that want to implement the standard.
Comments received: Many issues exist in this GitHub repository, that have been assigned to upcoming versions.

Documentation

Introductory video

This video explains the rationale behind the creation of the Experiments Metadata Checklist, highlighting key use cases and outlining future plans.
The slides are also available.

Relevant links

Record of past decisions
Meetings Agenda and Minutes
The Progress flowchart outlines the steps taken by the Experiments Metadata group in developing the checklist.

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
documentation		documentation
img		img
mappings		mappings
use_cases		use_cases
LICENSE		LICENSE
README.md		README.md
core.md		core.md
identifiers.md		identifiers.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GA4GH Experiments Metadata Standard

Purpose of the working group

Scope

How to use this checklist

Implementing the checklist in your new resource

Map your existing resource to the checklist

Core Properties Checklist

Mappings / Implementations

Future plans

Documentation

Introductory video

Relevant links

About

Uh oh!

Releases

Uh oh!

Contributors 3

License

ga4gh/experiments-metadata

Folders and files

Latest commit

History

Repository files navigation

GA4GH Experiments Metadata Standard

Purpose of the working group

Scope

How to use this checklist

Implementing the checklist in your new resource

Map your existing resource to the checklist

Core Properties Checklist

Mappings / Implementations

Future plans

Documentation

Introductory video

Relevant links

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Uh oh!

Contributors 3