|
| 1 | +# GIDE search input ro-crate profile |
| 2 | + |
| 3 | +January 2026 |
| 4 | + |
| 5 | +## ro-crate-metadata structure overview |
| 6 | + |
| 7 | +As input, we expect a _detached RO-Crate_ consisiting solely of an ro-crate-metadata.json. This file _MUST_: |
| 8 | + |
| 9 | +1. Generally abide by the requirements of a detatched ro-crate. At a high level this includes: |
| 10 | + - 1.1 Contain a self-describing RO-Crate Metadata Descriptor, with an @id of ro-crate-metadata.json. |
| 11 | + - 1.2 Have a root dataset entity which the ro-crate-madatada.json describes (via the 'about' property). |
| 12 | +2. The root dataset entity has an @id that is an absolute url to a page where the more information can be found about the entry and the data can be obtained. |
| 13 | +3. Use the context term definitions in gide-search-context.jsonld. Additional terms _MAY_ be added, however, terms that are defined in this context _MUST NOT_ be changed to point at new IRIs. |
| 14 | + |
| 15 | +Note that this profile is designed for detached ro-crates, and so depends on https://w3id.org/ro/crate/X where X equal to or greater than 1.2 (the version which defined detached ro-crates). |
| 16 | + |
| 17 | +The schema described below is considered 'open', in that additional relevant connections can be added to the document, though there is no guarentee that these will be used in indexing for search. |
| 18 | + |
| 19 | +### Expected objects |
| 20 | + |
| 21 | +The @graph of the ro-crate-metadata _MUST_ include: |
| 22 | + |
| 23 | +- Exactly one self-describing RO-Crate Metadata Descriptor. |
| 24 | +- Exactly one root dataset entity of type _Dataset_, linked from the self-describing RO-Crate Metadata Descriptor via the _about_ property. |
| 25 | +- One or more Taxon objects, of type _Taxon_, at least linked to the root dataset entity via the _about_ property. |
| 26 | +- One or more Imaging Method objects, of type _DefinedTerm_, at least linked to the root dataset entity via the _measurementMethod_ property. |
| 27 | +- One or more Authors, of type _Person_, linked to the root dataset entity via the _author_ property. |
| 28 | +- Exactly one Publisher object, of type _Organisation_, linked to the root dataset entity via the _publisher_ property. |
| 29 | + |
| 30 | +We _RECOMMEND_ including additional objects: |
| 31 | +- Objects of type _Organisation_ to describe Author _affiliation_ |
| 32 | +- Further descriptions of the biological content that was captured in the images, with objects of type _BioSample_, and _DefinedTerm_. |
| 33 | +- Further descriptions of the methods used to capture the images, with objects of type _LabProtocol_ and _DefinedTerm_. |
| 34 | +- Descriptions of the methods used to analyse or annotation images |
| 35 | +- Publications, of type _Publication_, which the dataset supported, or which provide additional detail on the methods used to create the dataset. |
| 36 | + |
| 37 | + |
| 38 | +### Overview of graph between objects |
| 39 | + |
| 40 | +Where a property can connect to a list of objects of different types, these have been grouped into a box. As an example, we require all relevant biological subjects of imaging to be referenced by the 'about' property of the dataset, but this does not preclude additional links between these objects (such as a BioSample's taxonomicRange also connecting to a Taxon listed in the 'about' of the dataset). |
| 41 | + |
| 42 | +```mermaid |
| 43 | + graph LR; |
| 44 | +
|
| 45 | +rocMetadata["ro-crate-metadata.json"] -- about --> RootDataset["[root] Dataset"]; |
| 46 | +RootDataset -- author --> Authors ; |
| 47 | +subgraph Authors["Person, Organisation"] |
| 48 | + Person |
| 49 | + Organisation |
| 50 | +end |
| 51 | +Person -- affiliation --> Organisation ; |
| 52 | +
|
| 53 | +RootDataset -- about --> BiologicalMetadata; |
| 54 | +subgraph BiologicalMetadata["Subject of imaging"] |
| 55 | + BioSample |
| 56 | + DefinedTerm1["DefinedTerm"] |
| 57 | + Taxon |
| 58 | +end |
| 59 | +
|
| 60 | +BioSample -- taxonomicRange --> Taxon; |
| 61 | +BioSample -- other properties --> DefinedTerm1; |
| 62 | +
|
| 63 | +RootDataset -- measurementMethod --> ImagingMetadata; |
| 64 | +
|
| 65 | +subgraph ImagingMetadata["Methods of imaging"] |
| 66 | + LabProtocol |
| 67 | + DefinedTerm2["DefinedTerm"] |
| 68 | +end |
| 69 | +
|
| 70 | +LabProtocol -- measurementTechnique --> DefinedTerm2; |
| 71 | +
|
| 72 | +RootDataset -- publisher --> Organisation1["Organisation"]; |
| 73 | +RootDataset -- size --> QuantitativeValue; |
| 74 | +RootDataset -- funder --> Grant |
| 75 | +RootDataset -- seeAlso --> ScholarlyArticle |
| 76 | +
|
| 77 | +``` |
| 78 | + |
| 79 | + |
| 80 | +## Detailed Object schema |
| 81 | + |
| 82 | +Property prefixes: |
| 83 | +- rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# |
| 84 | +- schema: http://schema.org/ |
| 85 | +- dwc: http://rs.tdwg.org/dwc/terms/ |
| 86 | +- dwciri: http://rs.tdwg.org/dwc/iri/ |
| 87 | +- bao: http://www.bioassayontology.org/bao# |
| 88 | + |
| 89 | +Note that the requirements requirements below apply to both the json field names as well as the RDF graph that would be produced by a conversion to RDF using the context of the json-ld document. |
| 90 | + |
| 91 | + |
| 92 | + |
| 93 | + |
| 94 | +### Dataset |
| 95 | + |
| 96 | +| Field | Property | Requirement | Cardinality | Description | |
| 97 | +| --- | --- | :---: | --- | --- | |
| 98 | +| @id | | REQUIRED | 1 | URL of the entry in its original database | |
| 99 | +| @type | rdf:type | REQUIRED | 1+ | MUST include Dataset, but may include other types. | |
| 100 | +| name | schema:name | REQUIRED | 1 | SHOULD identify the dataset to humans | |
| 101 | +| description | schema:description | REQUIRED | 1 | SHOULD provide an overview of the dataset to summarising the context in which the dataset is important. | |
| 102 | +| datePublished | schema:datePublished | REQUIRED | 1 | MUST be single string value in ISO 8601 date format. SHOULD be specified to the day. |
| 103 | +| license | schema:license | REQUIRED | 1 | SHOULD be a URL to a licence description, e.g. | |
| 104 | +| author | schema:author | REQUIRED | 1+ | MUST be _Persons_ or _Organsiations_ who contributed to the creation the dataset | |
| 105 | +| publisher | schema:publisher | REQUIRED | 1 | MUST be a a single _Organisation_ that provides the data at URL of the @id of this entry. | |
| 106 | +| about | schema:about | REQUIRED | 1+ | MUST contain all the information of on the biological matter relevant to this dataset. These MAY be _BioSamples_, _Taxons_, or _DefinedTerms_. | |
| 107 | +| measurementMethod | dwciri:measurementMethod | REQUIRED | 1+ | MUST contain all the information of on the imaging techniques relevant to this dataset. These may be _LabProtocols_, or _DefinedTerms_. | |
| 108 | +| thumbnailUrl | schema:thumbnailUrl | Recommended | 0+ | MUST be a list of URLs from which a thumbnail of an example image for the dataset can be obtained, for use in displaying example of the images. | |
| 109 | +| identifier | schema:identifier | Recommended | 1 | MUST be a unique identifier used by the publisher of the dataset to refer to this dataset. | |
| 110 | +| keywords | schema:keywords | optional | 0+ | Relevant keywords or tags used to describe the subject, methods, or contents of dataset. | |
| 111 | +| funder | schema:funder | optional | 0+ | The _Grants_ which funded the contributors or creation of this dataset. | |
| 112 | +| seeAlso | rdfs:seeAlso | optional | 0+ | The _ScholarlyArticles_ that were published alongside, or supported by, this Dataset. | |
| 113 | +| size | schema:size | optional | 0+ | _QuantitativeValues_ defining dimensions of the Dataset. Some dimensions are recommended (see the _QuantitativeValue_ section below) | |
| 114 | + |
| 115 | +### Person |
| 116 | + |
| 117 | +| Field | Property | Requirement | Cardinality | Description | |
| 118 | +| --- | --- | :---: | --- | --- | |
| 119 | +| @id | | REQUIRED | 1 | SHOULD be an ORCID id, otherwise a local identifier to the document | |
| 120 | +| @type | rdf:type | REQUIRED | 1+ | MUST include Person, but may include other types. | |
| 121 | +| name | schema:name | REQUIRED | 1 | SHOULD be the person's name for use in crediting authorship of the Dataset | |
| 122 | +| affiliation | schema:affiliation | Recommended | 0+ | SHOULD be the _Organisations_ a person was a member of at the time of creating or publishing this dataset, that are related to the creation of this dataset. | |
| 123 | +| email | email | optional | 1 | SHOULD be the email address to use to contact the Person about this Dataset | |
| 124 | + |
| 125 | + |
| 126 | +### Organisation |
| 127 | + |
| 128 | +Can appear through the property chains: |
| 129 | + |
| 130 | +- Dataset - publisher -> Organisation |
| 131 | +- Dataset - author -> Organisation |
| 132 | +- Dataset - author -> Person - affiliation -> Organisation |
| 133 | + |
| 134 | + |
| 135 | +| Field | Property | Requirement | Cardinality | Description | |
| 136 | +| --- | --- | :---: | --- | --- | |
| 137 | +| @id | | REQUIRED | 1 | SHOULD be an RORID id, a URI, or otherwise a local identifier to the document. | |
| 138 | +| @type | rdf:type | REQUIRED | 1+ | MUST include Organisation, but may include other types. | |
| 139 | +| name | schema:name | REQUIRED | 1 | SHOULD be a an identifying name (or acryonym, if that is more commonly used) of the organisation. | |
| 140 | +| url | schema:url | optional | 1 | SHOULD be a url to the main landing page of the organisation | |
| 141 | +| address | schema:address | optional | 1 | MUST be the organisation's address. | |
| 142 | + |
| 143 | +### DefinedTerm |
| 144 | + |
| 145 | +Can found through the property chains: |
| 146 | + |
| 147 | +- Dataset - about -> DefinedTerm |
| 148 | +- Dataset - about -> BioSample - x -> DefinedTerm |
| 149 | +- Dataset - measurementMethod -> DefinedTerm |
| 150 | +- Dataset - measurementMethod -> LabProtocol - measurementTechnique -> DefinedTerm |
| 151 | + |
| 152 | +where X can be a number of different relations (e.g. hasCellLine). If the connetion obj - X -> DefinedTerm is present, and Dataset - about | measurementMethod -> obj, then Dataset - about | measurementMethod -> DefinedTerm MUST also be included explicitly. |
| 153 | + |
| 154 | + |
| 155 | +| Field | Property | Requirement | Cardinality | Description | |
| 156 | +| --- | --- | :---: | --- | --- | |
| 157 | +| @id | | REQUIRED | 1 | MUST be an absolute URI to documentation about the term | |
| 158 | +| @type | rdf:type | REQUIRED | 1+ | MUST include DefinedTerm, but may include other types. | |
| 159 | +| name | schema:name | REQUIRED | 1 | SHOULD identify the term to humans, similar to an rdfs:label. | |
| 160 | + |
| 161 | +### Taxon |
| 162 | + |
| 163 | +Can be found through the property chains: |
| 164 | + |
| 165 | +- Dataset - about -> Taxon |
| 166 | +- Dataset - about -> BioSample - taxonomicRange -> Taxon |
| 167 | + |
| 168 | +If the connetion BioSample - X -> Taxon is present, and Dataset - about -> BioSample, then Dataset - about -> Taxon MUST also be included explicitly. |
| 169 | + |
| 170 | + |
| 171 | +| Field | Property | Requirement | Cardinality | Description | |
| 172 | +| --- | --- | :---: | --- | --- | |
| 173 | +| @id | | REQUIRED | 1 | SHOULD be an NCBI taxonomy ID | |
| 174 | +| @type | rdf:type | REQUIRED | 1+ | MUST include Taxon, but may include other types. | |
| 175 | +| scientificName | dwc:scientificName | REQUIRED | 1 | MUST be the scientific name of the Taxon, following the relevant nomenclature code for the taxon. SHOULD be as complete as possible. | |
| 176 | +| vernacularName | dwc:vernacularName | optional | 1 | SHOULD be a common or vernacular name of the Taxon. | |
| 177 | + |
| 178 | +### BioSample |
| 179 | + |
| 180 | +Please note that the BioSample type is a draft proposal to be added to schema.org, and thus http://schema.org/BioSample does not resolve. Details of the specification can be found at https://bioschemas.org/types/BioSample/0.2-DRAFT. |
| 181 | + |
| 182 | +| Field | Property | Requirement | Cardinality | Description | |
| 183 | +| --- | --- | :---: | --- | --- | |
| 184 | +| @id | | REQUIRED | 1 | CAN be a resolvable URL if one is available, but more likely it will be a local identifer within the document. | |
| 185 | +| @type | rdf:type | REQUIRED | 1+ | MUST include BioSample, but may include other types. | |
| 186 | +| name | schema:name | REQUIRED | 1 | SHOULD identify the main features of this biosample that distinguish it from others relevant to this dataset. | |
| 187 | +| description | schema:description | REQUIRED | 1 | SHOULD provide details of the biosample, such as variables that were modified on a case-by-case basis. | |
| 188 | +| taxonomicRange | schema:taxonomicRange | Recommended | 0+ | The _Taxons_ representing a classification of the BioSample. | |
| 189 | +| hasCellLine | bao:hasCellLine | optional | 0+ | The _DefinedTerm_ representing a classification of the BioSample. | |
| 190 | + |
| 191 | +### LabProtocol |
| 192 | + |
| 193 | +Please note that the LabProtocol type and labEquipment property are draft proposal to be added to schema.org, and thus http://schema.org/LabProtocol and http://schema.org/labEquipment do not resolve. Details of the specification can instead be found at https://bioschemas.org/profiles/LabProtocol/0.8-DRAFT. |
| 194 | + |
| 195 | +| Field | Property | Requirement | Cardinality | Description | |
| 196 | +| --- | --- | :---: | --- | --- | |
| 197 | +| @id | | REQUIRED | 1 | CAN be a resolvable URL if one is available, but more likely it will be a local identifer within the document. | |
| 198 | +| @type | rdf:type | REQUIRED | 1+ | MUST include LabProtocol, but may include other types. | |
| 199 | +| name | schema:name | REQUIRED | 1 | SHOULD identify the protocol and distinguish it from others relevant to this dataset. | |
| 200 | +| description | schema:description | REQUIRED | 1 | SHOULD provide details of the steps or settings involved in the protocol. | |
| 201 | +| labEquipment | schema:labEquipment | Recommended | 0+ | SHOULD be a description of the equipment used in the capture of the image. | |
| 202 | +| measurementTechnique | schema:description | Recommended | 0+ | SHOULD be a _DefinedTerm_ from the FBBI ontology if possible. | |
| 203 | + |
| 204 | + |
| 205 | +### Grant |
| 206 | + |
| 207 | +| Field | Property | Requirement | Cardinality | Description | |
| 208 | +| --- | --- | :---: | --- | --- | |
| 209 | +| @id | | REQUIRED | 1 | SHOULD a resolvable id, such as a DOI, otherwise a local identifier to the document | |
| 210 | +| @type | rdf:type | REQUIRED | 1+ | MUST include Organisation, but may include other types. | |
| 211 | +| name | schema:name | REQUIRED | 1 | SHOULD be the name or title of the grant. | |
| 212 | +| identifier | schema:identifier | optional | 1 | SHOULD be included if @id is only a local indentifier. | |
| 213 | +| url | schema:url | optional | 1 | SHOULD be a link to a website where more information can be found about the grant. | |
| 214 | + |
| 215 | +### ScholarlyArticle |
| 216 | + |
| 217 | +| Field | Property | Requirement | Cardinality | Description | |
| 218 | +| --- | --- | :---: | --- | --- | |
| 219 | +| @id | | REQUIRED | 1 | SHOULD be a resolvable id, such as a DOI, otherwise a local identifier to the document | |
| 220 | +| @type | rdf:type | REQUIRED | 1+ | MUST include Organisation, but may include other types. | |
| 221 | +| name | schema:name | REQUIRED | 1 | MUST be the title of the article as published. | |
| 222 | +| datePublished | schema:datePublished | Recommended | 1 | MUST be single string value in ISO 8601 date format. | |
| 223 | + |
| 224 | + |
| 225 | +### QuantitativeValue |
| 226 | + |
| 227 | +We recommend providing two QuantitativeValues for a dataset to give an estimate of the quantity of data associated that can be retrieved from the publisher |
| 228 | +- The number of files/images etc, stored by the publisher of the dataset. This MUST use the unitCode http://purl.obolibrary.org/obo/UO_0000189 (count) and the unitText "file count". |
| 229 | +- The quantity of bytes of the total dataset, as recorded by the publisher (whether compressed or otherwise). This MUST use the unitCode http://purl.obolibrary.org/obo/UO_0000233 (bytes) and the unitText "bytes". |
| 230 | + |
| 231 | +Additional QuantitativeValue may be included. |
| 232 | + |
| 233 | +| Field | Property | Requirement | Cardinality | Description | |
| 234 | +| --- | --- | :---: | --- | --- | |
| 235 | +| @id | | REQUIRED | 1 | SHOULD be a local identifier for the QuantitativeValue. | |
| 236 | +| @type | rdf:type | REQUIRED | 1+ | MUST include QuantitativeValue, but may include other types. | |
| 237 | +| value | schema:value | REQUIRED | 1 | SHOULD be number, using '.' to indicate a decimal point (rather than ','), and SHOULD avoid using either symbol as readability separators. | |
| 238 | +| unitCode | schema:unitCode | REQUIRED | 1 | | |
| 239 | +| unitText | schema:UnitText | REQUIRED | 1 | | |
0 commit comments