diff --git a/.gitignore b/.gitignore index 090a1f0..6d8e917 100644 --- a/.gitignore +++ b/.gitignore @@ -1,2 +1,3 @@ .idea .DS_Store +.vscode/ diff --git a/retrieval/programmatic-access/tag_querying.rst b/retrieval/programmatic-access/tag_querying.rst index 148bd3d..387cf6a 100644 --- a/retrieval/programmatic-access/tag_querying.rst +++ b/retrieval/programmatic-access/tag_querying.rst @@ -258,3 +258,8 @@ Miscellaneous The tags are all less than 21 Unicode characters in length. N.B. The tags described in this page are not to be confused with Locus Tags. +<<<<<<< HEAD + + +======= +>>>>>>> origin/sample_checklist_docs diff --git a/retrieval/programmatic-access/tag_querying_highleveltable.rst b/retrieval/programmatic-access/tag_querying_highleveltable.rst new file mode 100644 index 0000000..bebbfb0 --- /dev/null +++ b/retrieval/programmatic-access/tag_querying_highleveltable.rst @@ -0,0 +1,405 @@ + + + ++------------------------+------------+----------+ +| Header row, column 1 | Header 2 | Header 3 | ++========================+============+==========+ +| body row 1, column 1 | column 2 | column 3 | ++------------------------+------------+----------+ +| body row 2 | Cells may span | ++------------------------+-----------------------+ + +some textual + +reference + ++------------------------+------------------+-------------------+--------------------------------------------------------+-----------------------------------+ +| tag presentation | high level tag | low level tag | description | comment | ++========================+==================+===================+========================================================+===================================+ +| xref:arrayexpress | xref | arrayexpress || Object associated with an ArrayExpress || A xref is available that links | +| | | || `AE `_ || to ArrayExpress | +| | | || record | | ++------------------------+------------------+-------------------+--------------------------------------------------------+-----------------------------------+ +| xref:europepmc | xref | europepmc || Object associated with a European PubmedCentral || A xref is available that links | +| | | || `EPMC `_ record || to European PubmedCentral | ++------------------------+------------------+-------------------+--------------------------------------------------------+-----------------------------------+ +| xref:pubmed | xref | pubmed || Object associated with an `NCBI Pubmed || A xref is available that links | +| | | || `_ record || to NCBI Pubmed | ++------------------------+------------------+-------------------+--------------------------------------------------------+-----------------------------------+ +| xref:worms | xref | worms || Object associated with a | | +| | | || `WoRMS `_ record | | ++------------------------+------------------+-------------------+--------------------------------------------------------+-----------------------------------+ +| xref:unieuk | xref | unieuk || Object associated with a || A xref is available that links | +| | | || Universal taxonomic framework and integrated || to UNIEUK | +| | | || reference gene databases for Eukaryotic biology, | | +| | | || ecology, and evolution | | +| | | || `(UNIEUK ) `_ record | | ++------------------------+------------------+-------------------+--------------------------------------------------------+-----------------------------------+ + +Geographical Tags + ++------------------------+------------------+-------------------+-----------------------------------------+-----------------------------------------+ +| tag presentation | high level tag | low level tag | description | comment | ++========================+==================+===================+=========================================+=========================================+ +| coastal_brackish | coastal_brackish | || Evidence that the object is “coastal || There will likely be other low level | +| | | || or brackish” environment associated. || tags to provide context. | ++------------------------+------------------+-------------------+-----------------------------------------+-----------------------------------------+ +| coastal_brackish:high | coastal_brackish | high_confidence || Strong evidence that the object is | | +| _confidence | | || “coastal or brackish” environment | | +| | | || associated. | | ++------------------------+------------------+-------------------+-----------------------------------------+-----------------------------------------+ +| coastal_brackish:medium| coastal_brackish | medium_confidence || Moderate evidence that the object is | | +| _confidence | | || “coastal or brackish” environment | | +| | | || associated. | | ++------------------------+------------------+-------------------+-----------------------------------------+-----------------------------------------+ +| coastal_brackish:low_c | coastal_brackish | low_confidence || Weak evidence that the object is | | +| onfidence | | || “coastal or brackish” environment | | +| | | || associated. | | ++------------------------+------------------+-------------------+-----------------------------------------+-----------------------------------------+ +| freshwater | freshwater | || Evidence that it is “freshwater” || There will likely be other low level | +| | | || environment associated || tags to provide context. | ++------------------------+------------------+-------------------+-----------------------------------------+-----------------------------------------+ +| freshwater:high_confid | freshwater | high_confidence || Strong evidence that the object is | | +| ence | | || freshwater environment associated. | | ++------------------------+------------------+-------------------+-----------------------------------------+-----------------------------------------+ +| freshwater:medium_conf | freshwater | medium_confidence || Moderate evidence that the object is | | +| idence | | || freshwater environment associated. | | ++------------------------+------------------+-------------------+-----------------------------------------+-----------------------------------------+ +| freshwater:low_confide | freshwater | low_confidence || Weak evidence that the object is | | +| nce | | || freshwater environment associated. | | ++------------------------+------------------+-------------------+-----------------------------------------+-----------------------------------------+ +| marine | marine | || Evidence that it is “marine” || There will likely be other low level | +| | | || environment associated. || tags to provide context. | ++------------------------+------------------+-------------------+-----------------------------------------+-----------------------------------------+ +| marine:high_confidence | marine | high_confidence || Strong evidence that the object is | | +| | | || marine environment associated. | | ++------------------------+------------------+-------------------+-----------------------------------------+-----------------------------------------+ +| marine:medium_confiden | marine | medium_confidence || Moderate evidence that the object is | | +| ce | | || marine environment associated. | | ++------------------------+------------------+-------------------+-----------------------------------------+-----------------------------------------+ +| marine:low_confidence | marine | low_confidence || Weak evidence that the object is | | +| | | || marine environment associated. | | ++------------------------+------------------+-------------------+-----------------------------------------+-----------------------------------------+ +| terrestrial | terrestrial | || Evidence that it is terrestrial(land) || There will likely be other low level | +| | | || environment associated. || tags to provide context. | ++------------------------+------------------+-------------------+-----------------------------------------+-----------------------------------------+ +| terrestrial:high_confi | terrestrial | high_confidence || Strong evidence that the object is | | +| dence | | || terrestrial(land) environment | | +| | | || associated. | | ++------------------------+------------------+-------------------+-----------------------------------------+-----------------------------------------+ +| terrestrial:medium_con | terrestrial | medium_confidence || Moderate evidence that the object is | | +| fidence | | || terrestrial(land) environment | | +| | | || associated. | | ++------------------------+------------------+-------------------+-----------------------------------------+-----------------------------------------+ +| terrestrial:low_confid | terrestrial | low_confidence || Weak evidence that the object is | | +| ence | | || terrestrial(land) environment | | +| | | || associated. | | ++------------------------+------------------+-------------------+-----------------------------------------+-----------------------------------------+ + + + + ++------------------------------------+------------------+---------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------+ +| tag presentation | high level tag | low level tag | description | comment | ++====================================+==================+=================================+=========================================================================================================================================================================================+========================================================================+ +| coastal_brackish | coastal_brackish | | Some evidence that the object is “coastal or brackish” environment associated. | There will likely be other low level tags to provide context. | ++------------------------------------+------------------+---------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------+ +| coastal_brackish:high_confidence | coastal_brackish | high_confidence | strong evidence that the object is “coastal or brackish” environment associated. | | ++------------------------------------+------------------+---------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------+ +| coastal_brackish:medium_confidence | coastal_brackish | medium_confidence | moderate evidence that the object is “coastal or brackish” environment associated. | | ++------------------------------------+------------------+---------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------+ +| coastal_brackish:low_confidence | coastal_brackish | low_confidence | weak evidence that the object is “coastal or brackish” environment associated. | | ++------------------------------------+------------------+---------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------+ +| freshwater | freshwater | | Some evidence that it is “freshwater” environment assosciated | There will likely be other low level tags to provide context. | ++------------------------------------+------------------+---------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------+ +| freshwater:high_confidence | freshwater | high_confidence | Strong evidence that the object is freshwater environment associated. | | ++------------------------------------+------------------+---------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------+ +| freshwater:medium_confidence | freshwater | medium_confidence | moderate evidence that the object is freshwater environment associated. | | ++------------------------------------+------------------+---------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------+ +| freshwater:low_confidence | freshwater | low_confidence | weak evidence that the object is freshwater environment associated. | | ++------------------------------------+------------------+---------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------+ +| marine | marine | | Some evidence that it is “marine” environment assosciated | There will likely be other low level tags to provide context. | ++------------------------------------+------------------+---------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------+ +| marine:high_confidence | marine | high_confidence | Strong evidence that the object is marine environment associated. | | ++------------------------------------+------------------+---------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------+ +| marine:medium_confidence | marine | medium_confidence | moderate evidence that the object is marine environment associated. | | ++------------------------------------+------------------+---------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------+ +| marine:low_confidence | marine | low_confidence | weak evidence that the object is marine environment associated. | | ++------------------------------------+------------------+---------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------+ +| terrestrial | terrestrial | | Some evidence that it is terrestrial(land) environment associated. | There will likely be other low level tags to provide context. | ++------------------------------------+------------------+---------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------+ +| terrestrial:high_confidence | terrestrial | high_confidence | Strong evidence that the object is terrestrial(land) environment associated. | | ++------------------------------------+------------------+---------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------+ +| terrestrial:medium_confidence | terrestrial | medium_confidence | moderate evidence that the object is terrestrial(land) environment associated. | | ++------------------------------------+------------------+---------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------+ +| terrestrial:low_confidence | terrestrial | low_confidence | weak evidence that the object is terrestrial(land) environment associated. | | ++------------------------------------+------------------+---------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------+ + + + ++-------------------+---------------------------------------------------------+------------------------------------------------------+ +| high level tag | description | object type | ++===================+=========================================================+======================================================+ +|| pathogen || The sample has been automatically determined to || assembly; sample; sequence; study; secondary_study; | +|| || belong to the Pathogens Portal || taxonomy | ++-------------------+---------------------------------------------------------+------------------------------------------------------+ +|| coastal_brackish || The sample has been automatically determined to || read_run; sample; taxonomy | +|| || evaluation of GPS and other parameters to have || | +|| || some evidence of being collected from either a || | +|| || coastal or brackish environment || | ++-------------------+---------------------------------------------------------+------------------------------------------------------+ +|| freshwater || The sample has been automatically determined to || read_run; sample; taxonomy | +|| || evaluation of GPS and other parameters to have || | +|| || some evidence of being collected from a || | +|| || freshwater environment || | ++-------------------+---------------------------------------------------------+------------------------------------------------------+ +|| marine || The sample has been automatically determined to || read_run; sample; taxonomy | +|| || evaluation of GPS and other parameters to have || | +|| || some evidence of being collected from a || | +|| || marine environment || | ++-------------------+---------------------------------------------------------+------------------------------------------------------+ +|| terrestrial || The sample has been automatically determined to || read_run; sample; taxonomy | +|| || evaluation of GPS and other parameters to have || | +|| || some evidence of being collected from a || | +|| || terrestrial environment || | ++-------------------+---------------------------------------------------------+------------------------------------------------------+ +|| datahub || The sample has been referenced to a datahub || analysis; read_run; sample; secondary_study | +|| || Currently tags have been generated for || | +|| || `FAANG ` and || | +|| || `Pathogen ` || | ++-------------------+---------------------------------------------------------+------------------------------------------------------+ +|| xrefs || The sample has been referenced to am external || Depends on how the user submitted | +|| || to the EMBL-EBI repository. Currently tags have || | +|| || been generated for WORMS and UniEUK || | ++-------------------+---------------------------------------------------------+------------------------------------------------------+ +|| covid19 || The sample has been automatically determined to || analysis; read_run; sample; sequence; study | +|| || belong to the COVID19 portal || | ++-------------------+---------------------------------------------------------+------------------------------------------------------+ + + +another test + ++--------------------+---------------------------------------------------+--------------------------------------------------------+ +| high level tag | description | object type | ++====================+===================================================+========================================================+ +| | pathogen | | The sample has been automatically determined to | | assembly; sample; sequence; study; secondary_study; | +| | | belong to the Pathogens Portal | | taxonomy | ++--------------------+---------------------------------------------------+--------------------------------------------------------+ +| | coastal_brackish | | The sample has been automatically determined by | | read_run; sample; taxonomy | | +| | | evaluation of GPS and other parameters to have | | | +| | | some evidence of being collected from either a | | | +| | | coastal or brackish environment. | | | ++--------------------+---------------------------------------------------+--------------------------------------------------------+ + + + +Pathogen Related Tags + +chatgpt + ++---------------------+----------------+-------------------+-------------------------------------------------------+----------------------------------------------+ +| tag presentation | high level tag | low level tag | description | comment | ++=====================+================+===================+=======================================================+==============================================+ +|| pathogen || pathogen || Object is some || There will likely be other low level tags to provide || | +|| || || type of pathogen || context. || | ++---------------------+----------------+-------------------+-------------------------------------------------------+----------------------------------------------+ +| pathogen:priority | pathogen | priority | | | ++---------------------+----------------+-------------------+-------------------------------------------------------+----------------------------------------------+ +|| pathogen:bacterium || pathogen || bacterium || Object is of a bacterium organism. || At the time of documentation, the bacterium | +|| || || || || is not specifically pathogenic. | ++---------------------+----------------+-------------------+-------------------------------------------------------+----------------------------------------------+ +|| pathogen:fungus || pathogen || fungus || Object is of a fungus organism. || At the time of documentation, the fungus | +|| || || || || is not specifically pathogenic. | ++---------------------+----------------+-------------------+-------------------------------------------------------+----------------------------------------------+ +|| pathogen:helminth || pathogen || helminth || Object is of a helminth organism. || At the time of documentation, the helminth | +|| || || || || is not specifically pathogenic. | ++---------------------+----------------+-------------------+-------------------------------------------------------+----------------------------------------------+ +|| pathogen:protozoan || pathogen || protozoan || Object is of a protozoan organism. || At the time of documentation, the protozoan | +|| || || || || | ++---------------------+----------------+-------------------+-------------------------------------------------------+----------------------------------------------+ +|| pathogen:virus || pathogen || virus || Object is of a virus organism. || At the time of documentation, the virus | +|| || || || || is not specifically pathogenic. | ++---------------------+----------------+-------------------+-------------------------------------------------------+----------------------------------------------+ +|| datahub:faang || datahub || Faang || Is a Functional Annotation of Animal Genomes project | | +|| || || || `(FAANG) `_ sample | | ++---------------------+----------------+-------------------+-------------------------------------------------------+----------------------------------------------+ +| datahub:metagenome | datahub | metagenome | Is a metagenome and present in that datahub | | ++---------------------+----------------+-------------------+-------------------------------------------------------+----------------------------------------------+ +| covid19 | | covid19 | Object associated with COVID-19 | | ++---------------------+----------------+-------------------+-------------------------------------------------------+----------------------------------------------+ +| covid19Host | | covid19Host | Object associated with a COVID-19 Host | | ++---------------------+----------------+-------------------+-------------------------------------------------------+----------------------------------------------+ + + + + ++------------------------------------+------------------+---------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------+ +| tag presentation | high level tag | low level tag | description | comment | ++====================================+==================+=================================+=========================================================================================================================================================================================+========================================================================+ +| pathogen | pathogen | Object is some type of pathogen | There will likely be other low level tags to provide context. | | ++------------------------------------+------------------+---------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------+ +| pathogen:priority | pathogen | priority | | | ++------------------------------------+------------------+---------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------+ +| pathogen:bacterium | pathogen | bacterium | Object is of a bacterium organism. | At time of documentation the bacterium is not specifically pathogenic. | ++------------------------------------+------------------+---------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------+ +| pathogen:fungus | pathogen | fungus | Object is of a fungus orgnism. | At time of documentation the fungus is not specifically pathogenic. | ++------------------------------------+------------------+---------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------+ +| pathogen:helminth | pathogen | helminth | | At time of documentation the helminth is not specifically pathogenic. | ++------------------------------------+------------------+---------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------+ +| pathogen:protozoan | pathogen | protozoan | Object is of a protozon organism. | At time of documentation the protozoan is not specifically pathogenic. | ++------------------------------------+------------------+---------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------+ +| pathogen:virus | pathogen | virus | Object is of a virus organism. | At time of documentation the virus is not specifically pathogenic. | ++------------------------------------+------------------+---------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------+ +| datahub:faang | datahub | Faang | Is a `Functional Annotation of ANimal Genomes project (FAANG) `_ sample and present in that datahub | | ++------------------------------------+------------------+---------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------+ +| datahub:metagenome | datahub | metagenome | Is a metagenome and present in that datahub | | ++------------------------------------+------------------+---------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------+ +| covid19 | | covid19 | Object associated with covid19 | | ++------------------------------------+------------------+---------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------+ +| covid19Host | | covid19Host | Object associated with a covid19 Host | | ++------------------------------------+------------------+---------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------+ + +Reference Tags ++------------------------------------+------------------+---------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------+ +| tag presentation | high level tag | low level tag | description | comment | ++====================================+==================+=================================+=========================================================================================================================================================================================+========================================================================+ +| xref:arrayexpress | xref | arrayexpress | Object associated with an `ArrayExpress `_ record | A xref is available that links to ArrayExpress | ++------------------------------------+------------------+---------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------+ +| xref:europepmc | xref | europepmc | Object associated with a `European PubmedCentral `_ record | A xref is available that links to European PubmedCentral | ++------------------------------------+------------------+---------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------+ +| xref:pubmed | xref | pubmed | Object associated with an `NCBI Pubmed `_ record | A xref is available that links to NCBI Pubmed | ++------------------------------------+------------------+---------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------+ +| xref:worms | xref | worms | Object associated with a `WoRMS `_ record | | ++------------------------------------+------------------+---------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------+ +| xref:unieuk | xref | unieuk | Object associated with a `UNIEUK /(Universal taxonomic framework and integrated reference gene databases for Eukaryotic biology, ecology, and evolution ) `_ record | A xref is available that links to UNIEUK | + + + + + + + + + +tag presentation +high level tag +low level tag +description +comment +pathogen +pathogen +Object is some type of pathogen +There will likely be other low level tags to provide context. +pathogen:priority +pathogen +priority +pathogen:bacterium +pathogen +bacterium +Object is of a bacterium organism. +At time of documentation the bacterium is not specifically pathogenic. +pathogen:fungus +pathogen +fungus +Object is of a fungus orgnism. +At time of documentation the fungus is not specifically pathogenic. +pathogen:helminth +pathogen +helminth +At time of documentation the helminth is not specifically pathogenic. +pathogen:protozoan +pathogen +protozoan +Object is of a protozon organism. +At time of documentation the protozoan is not specifically pathogenic. +pathogen:virus +pathogen +virus +Object is of a virus organism. +At time of documentation the virus is not specifically pathogenic. +coastal_brackish +coastal_brackish +Some evidence that the object is “coastal or brackish” environment associated. +There will likely be other low level tags to provide context. +coastal_brackish:high_confidence +coastal_brackish +high_confidence +strong evidence that the object is “coastal or brackish” environment associated. +coastal_brackish:medium_confidence +coastal_brackish +medium_confidence +moderate evidence that the object is “coastal or brackish” environment associated. +coastal_brackish:low_confidence +coastal_brackish +low_confidence +weak evidence that the object is “coastal or brackish” environment associated. +freshwater +freshwater +Some evidence that it is “freshwater” environment assosciated +There will likely be other low level tags to provide context. +freshwater:high_confidence +freshwater +high_confidence +Strong evidence that the object is freshwater environment associated. +freshwater:medium_confidence +freshwater +medium_confidence +moderate evidence that the object is freshwater environment associated. +freshwater:low_confidence +freshwater +low_confidence +weak evidence that the object is freshwater environment associated. +marine +marine +Some evidence that it is “marine” environment assosciated +There will likely be other low level tags to provide context. +marine:high_confidence +marine +high_confidence +Strong evidence that the object is marine environment associated. +marine:medium_confidence +marine +medium_confidence +moderate evidence that the object is marine environment associated. +marine:low_confidence +marine +low_confidence +weak evidence that the object is marine environment associated. +terrestrial +terrestrial +Some evidence that it is terrestrial(land) environment associated. +There will likely be other low level tags to provide context. +terrestrial:high_confidence +terrestrial +high_confidence +Strong evidence that the object is terrestrial(land) environment associated. +terrestrial:medium_confidence +terrestrial +medium_confidence +moderate evidence that the object is terrestrial(land) environment associated. +terrestrial:low_confidence +terrestrial +low_confidence +weak evidence that the object is terrestrial(land) environment associated. +datahub:faang +datahub +Faang +Is a Functional Annotation of ANimal Genomes project (FAANG) sample and present in that datahub +datahub:metagenome +datahub +metagenome +Is a metagenome and present in that datahub +xref:arrayexpress +xref +arrayexpress +Object associated with an ArrayExpress record +A xref is available that links to ArrayExpress +xref:europepmc +xref +europepmc +Object associated with a European PubmedCentral record +A xref is available that links to European PubmedCentral +xref:pubmed +xref +pubmed +Object associated with an NCBI Pubmed record \ No newline at end of file diff --git a/retrieval/programmatic-access/tag_querying_works.rst b/retrieval/programmatic-access/tag_querying_works.rst new file mode 100644 index 0000000..9df3968 --- /dev/null +++ b/retrieval/programmatic-access/tag_querying_works.rst @@ -0,0 +1,262 @@ +======================== +Text Tags On ENA Objects +======================== + +----------------- +Table of Contents +----------------- + +* What are Tags and Why are they Useful? +* How many Tags can an Object Possess? +* What Tags are Available? +* How are the Tags Created? +* Miscellaneous + +.. _my-reference-label: + +-------------------------------------- +What are Tags and Why are they Useful? +-------------------------------------- +The tags are controlled textual annotations provided to objects, such as sample and taxonomy. + +The purpose of these is to make searching and filtering much easier. In ENA they are often used to determine object membership of certain data portals. Vice versa they can also be used to easily access vignettes of data from which to build a new data portal rapidly. + +Examples: + +* Find all pathogenic samples by using the “pathogen” tag (this is used to drive the data coverage of the `Pathogens Portal `_.) +* Use “marine:high_confidence” tag to find all samples that are highly likely to be from the marine environment. +* Find all records in ENA data that have a corresponding record cross-referenced to the `WoRMS - World Register of Marine Species `_, by searching “xref:worms”. + +The tagging system has proved useful in determining the object membership of certain domain specific data portals such as Pathogens Portal. Conversely they can also be used to easily obtain vignettes of data from which to build a new data portal rapidly. + +------------------------------------ +How many Tags can an Object Possess? +------------------------------------ +An object such as a sample can have zero or multiple tags. + +A sample for example could be tagged as both “marine:high_confidence” and “terrestrial:low_confidence”. + +------------------------ +What Tags are Available? +------------------------ + +Most of the sample and taxonomy tags have the format: high_level_tag:low_level_tag. The high level tag is often used to provide some extra context to the more granular tag. + + +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +Table of Object High Level Tags +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + + +.. csv-table:: High Level Tags + :header: "high level tag", "description", "object type" + :widths: 20, 300, 50 + + "pathogen", "The sample has been automatically determined to belong to the Pathogens Portal", "assembly; sample; sequence; study; secondary_study; taxonomy" + "coastal_brackish", "The sample has been automatically determined by evaluation of GPS and other parameters to have some evidence of being collected from either a coastal or brackish environment.", "read_run; sample; taxonomy" + "freshwater", "The sample has been automatically determined by evaluation of GPS and other parameters to have some evidence of being collected from a freshwater environment.", "read_run; sample; taxonomy" + "marine", "The sample has been automatically determined by evaluation of GPS and other parameters to have some evidence of being collected from a marine environment.", "read_run; sample; taxonomy" + "terrestrial", "The sample has been automatically determined by evaluation of GPS and other parameters to have some evidence of being collected from a terrestrial environment.", "read_run; sample; taxonomy" + "datahub", "The sample has been automatically determined to belong to a datahub. Currently tags have been generated for `FAANG `_ and `Pathogen `_", "analysis; read_run; sample; secondary_study" + "xref", "The sample has been referenced in an external to the EMBL-EBI repository. Currently tags have been generated for WORMS and UniEUK.", "Depends on how the user submitted" + "covid19", "The sample has been automatically determined to belong to the COVID19 portal.", "analysis; read_run; sample; sequence; study" + + + +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +Table of All Object High and Low Level Tags +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +These are at several types of objects especially sample and taxonomy. Please see the object types in the previous (high level tag) +table, to see what they apply to. + +.. list-table:: Object High and Low Level Tags + :widths: 15 10 30 10 10 + :header-rows: 1 + + * - tag presentation + - high level tag + - low level tag + - description + - comment + * - pathogen + - pathogen + - + - Object is some type of pathogen + - There will likely be other low level tags to provide context. + * - pathogen:priority + - pathogen + - priority + - + - + * - pathogen:bacterium + - pathogen + - bacterium + - Object is of a bacterium organism. + - At time of documentation the bacterium is not specifically pathogenic. + * - pathogen:fungus + - pathogen + - fungus + - Object is of a fungus orgnism. + - At time of documentation the fungus is not specifically pathogenic. + * - pathogen:helminth + - pathogen + - helminth + - + - At time of documentation the helminth is not specifically pathogenic. + * - pathogen:protozoan + - pathogen + - protozoan + - Object is of a protozon organism. + - At time of documentation the protozoan is not specifically pathogenic. + * - pathogen:virus + - pathogen + - virus + - Object is of a virus organism. + - At time of documentation the virus is not specifically pathogenic. + * - coastal_brackish + - coastal_brackish + - + - Some evidence that the object is “coastal or brackish” environment associated. + - There will likely be other low level tags to provide context. + * - coastal_brackish:high_confidence + - coastal_brackish + - high_confidence + - strong evidence that the object is “coastal or brackish” environment associated. + - + * - coastal_brackish:medium_confidence + - coastal_brackish + - medium_confidence + - moderate evidence that the object is “coastal or brackish” environment associated. + - + * - coastal_brackish:low_confidence + - coastal_brackish + - low_confidence + - weak evidence that the object is “coastal or brackish” environment associated. + - + * - freshwater + - freshwater + - + - Some evidence that it is “freshwater” environment assosciated + - There will likely be other low level tags to provide context. + * - freshwater:high_confidence + - freshwater + - high_confidence + - Strong evidence that the object is freshwater environment associated. + - + * - freshwater:medium_confidence + - freshwater + - medium_confidence + - moderate evidence that the object is freshwater environment associated. + - + * - freshwater:low_confidence + - freshwater + - low_confidence + - weak evidence that the object is freshwater environment associated. + - + * - marine + - marine + - + - Some evidence that it is “marine” environment assosciated + - There will likely be other low level tags to provide context. + * - marine:high_confidence + - marine + - high_confidence + - Strong evidence that the object is marine environment associated. + - + * - marine:medium_confidence + - marine + - medium_confidence + - moderate evidence that the object is marine environment associated. + - + * - marine:low_confidence + - marine + - low_confidence + - weak evidence that the object is marine environment associated. + - + * - terrestrial + - terrestrial + - + - Some evidence that it is terrestrial(land) environment associated. + - There will likely be other low level tags to provide context. + * - terrestrial:high_confidence + - terrestrial + - high_confidence + - Strong evidence that the object is terrestrial(land) environment associated. + - + * - terrestrial:medium_confidence + - terrestrial + - medium_confidence + - moderate evidence that the object is terrestrial(land) environment associated. + - + * - terrestrial:low_confidence + - terrestrial + - low_confidence + - weak evidence that the object is terrestrial(land) environment associated. + - + * - datahub:faang + - datahub + - Faang + - Is a `Functional Annotation of ANimal Genomes project (FAANG) `_ sample and present in that datahub + - + * - datahub:metagenome + - datahub + - metagenome + - Is a metagenome and present in that datahub + - + * - xref:arrayexpress + - xref + - arrayexpress + - Object associated with an `ArrayExpress `_ record + - A xref is available that links to ArrayExpress + * - xref:europepmc + - xref + - europepmc + - Object associated with a `European PubmedCentral `_ record + - A xref is available that links to European PubmedCentral + * - xref:pubmed + - xref + - pubmed + - Object associated with an `NCBI Pubmed `_ record + - A xref is available that links to NCBI Pubmed + * - xref:worms + - xref + - worms + - Object associated with a `WoRMS `_ record + - + * - xref:unieuk + - xref + - unieuk + - Object associated with a `UNIEUK /(Universal taxonomic framework and integrated reference gene databases for Eukaryotic biology, ecology, and evolution ) `_ record + - A xref is available that links to UNIEUK + * - covid19 + - + - covid19 + - Object associated with covid19 + - + * - covid19Host + - + - covid19Host + - Object associated with a covid19 Host + - + +------------------------- +How are the Tags Created? +------------------------- + +The tags are typically assigned by automatic processes analysing the user supplied metadata around an object. + +For example, the identification of “marine” sample records is systematically assessed by a combination of geo-coordinates and taxonomic evidence. We can further qualify such identification by a level of confidence which is dictated by a combination of the evidence available on the record to support said assertion. + +This is an evolving and continuously improving process, where the algorithms and the rule-sets used for classification can be updated as new insights are obtained and thus results in the assigned tags being regularly refreshed. The flexibility of this system allows for new classifications to be easily created allowing the definition of new, high-level contextual groupings for ENA data making the process of discovery more intuitive for certain user communities. + + +------------- +Miscellaneous +------------- + +The tags are all less than 21 Unicode characters in length. + +N.B. The tags described in this page are not to be confused with Locus Tags. + + diff --git a/submit/annotations/Image_Biosample_3rdPartyCuration.png b/submit/annotations/Image_Biosample_3rdPartyCuration.png new file mode 100644 index 0000000..893406a Binary files /dev/null and b/submit/annotations/Image_Biosample_3rdPartyCuration.png differ diff --git a/submit/annotations/bearer_file b/submit/annotations/bearer_file new file mode 100644 index 0000000..8584392 --- /dev/null +++ b/submit/annotations/bearer_file @@ -0,0 +1 @@ +eyJhbGciOiJSUzI1NiJ9.eyJpc3MiOiJodHRwczovL2FhaS5lYmkuYWMudWsvc3AiLCJqdGkiOiItaGhBOGIyYVhtR2FzTXlrQy1FWl9BIiwiaWF0IjoxNzA1NjY0MjUzLCJzdWIiOiJ1c3ItNmZlZjc3NWEtNGM5OS00NWE1LThmYzMtMzZjYjMyYmFhZWVhIiwiZW1haWwiOiJ3b29sbGFyZEBlYmkuYWMudWsiLCJuaWNrbmFtZSI6InB3b29sbGFyZF9wcm9kIiwibmFtZSI6IlBldGVyIFdvb2xsYXJkIiwiZG9tYWlucyI6W10sImV4cCI6MTcwNTY2Nzg1M30.TmS_z0qEzZeCa-GG1A9uPi3Id4LDTh9CzUy8Q95ArVXXbj_25y8sAEYUDttnqeBxizlrrBmN7VKAPAmPPlZ0glBcK7_vAl6XvnSlNAISO3KYFbYm_8Y7-7hsmchMSw9VrL47s_1SGgjjcklhLKZ4T_3IiZscLCVQ0uBkVZ-mNzhNj3Gf8BDevBofTUHVO9tJuBxqH6hZeKFGEQEZfidQdU7o6Q6rt1i6NR9zkDd_0kkONQy987oN3gGlC_iuwVN8usRCkLw1_Oa0SGWuOAlNu0Gs33ts5Pu54TWEPcBcOESdpb6eyiPUzGVHA3NFiHwI2tme4Fz6DXjKTL2HB8ij3A \ No newline at end of file diff --git a/submit/annotations/clearinghouse_for_ENA_users.md b/submit/annotations/clearinghouse_for_ENA_users.md new file mode 100644 index 0000000..21996cd --- /dev/null +++ b/submit/annotations/clearinghouse_for_ENA_users.md @@ -0,0 +1,131 @@ +# Clearinghouse for ENA Users + + +* [Clearinghouse for ENA Users](#clearinghouse-for-ena-users) + * [Purpose of this document](#purpose-of-this-document) + * [Introduction](#introduction) + * [Background](#background) + * [The Relevancy to ENA](#the-relevancy-to-ena) + * [Example Use Cases of Projects Using or Intending to Use the ClearingHouse](#example-use-cases-of-projects-using-or-intending-to-use-the-clearinghouse) + * [Curation objects](#curation-objects) + * [Examples](#examples) + * [*Marine Metagenome Sample Curation:*](#marine-metagenome-sample-curation) + * [*Marine Metagenome Sample Curation:*](#marine-metagenome-sample-curation-1) + * [*SARS-CoV-2 Sequence Curation:*](#sars-cov-2-sequence-curation) + * [Programmatically querying Clearinghouse data](#programmatically-querying-clearinghouse-data-) + * [Tips for querying and submitting Clearinghouse data](#tips-for-querying-and-submitting-clearinghouse-data) + * [How is using the Clearinghouse Different from Updating Records in ENA?](#how-is-using-the-clearinghouse-different-from-updating-records-in-ena) + * [Appendix:](#appendix) + * [1. A template bash script for submission](#1-a-template-bash-script-for-submission) + + +## Purpose of this document +This document's purpose is to provide additional information about the ELIXIR Clearinghouse, to make it easier for ENA (and non-ENA) users to submit data to it. +
+
+It is a supplement to the API document [here](https://docs.google.com/document/d/1y1a4xQwCddntDkmY3qq1XvxtMUZAtW0h3RhEMo3Gtho/edit#heading=h.1ksv4uv), which contains more technical information regarding usage of the Clearinghouse API, so please ensure you read both before starting. The API document is the source of truth and will be more frequently updated than this one. + +## Introduction +The ELIXIR Clearinghouse enables extension, correction and improvement of publicly available annotations on sample, sequence, run/experiment and study records available in the European Nucleotide Archive (ENA) (and by extension, as the wider International Nucleotide Sequence Database Collaboration (INSDC) databases). The overall aim is to make metadata more FAIR and improve its quality. + +Curations submitted to Clearinghouse will present alongside the record, **without the original archived metadata being changed**. This allows the scientific community to enhance existing metadata records, for example to add information gleaned from paper supplements, or propose improved attributes that previously did not conform to standards/ontologies, without modifying the original record (often) submitted by a different user. + +### Background +The Clearinghouse was developed as part of an ELIXIR project to: +- strengthen collaborations between ELIXIR resources +- improve the quality and impact of metadata, and +- build more sustainable data resources +It was initially developed to support records in the marine domain, but has since expanded to cover multiple project types, see below. +For more information please see https://elixir-europe.org/internal-projects/commissioned-services/establishment-data-clearinghouse + + +### The Relevancy to ENA +The Clearinghouse is deliberately set up to allow submission of new or updated metadata (related to INSDC records) originating from different sources. ENA staff have been amongst the most enthusiastic users and annotators, but have also supported other groups to submit these metadata curations. Wider enthusiasm is also growing. + + + + +## Example Use Cases of Projects Using or Intending to Use the ClearingHouse +| Project | Clearinghouse usage | ENA members involved | External groups involved | +|-------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------|---------------------------------------| +| BY-COVID (beyond-COVID) |
  • 27,566,814 SARS-CoV-2 curations pushed through by UiT
  • appropriate curations presenting alongside records in ENA Browser
| Zahra Waheed, Nadim Rahman | The Arctic University of Norway (UiT) | +|

BlueCloud

|
  • Extra metadata around geographical determination (using GPS and taxonomy) and from GPS
  • e.g. EEZ and high sea
  • 0.5 million records curated
| Peter Woollard, Stéphane Pesant, Lili Meszaros | WoRMS | +| BiCIKL |
  • Expanding on metadata available mostly for sequences
  • Updating taxonomic identifications of sequence data, deriving from the Unite pipelines
  • Potential for further updates (e.g. specimen voucher info) coming from other groups (e.g. Museums)
| Joana Pauperio | PlutoF | +| MGnify |
  • Expanding on metadata from literature - assignment of biomes via machine learning? TBC | Josephine Burgin | | +| DToL (Darwin Tree of Life Project) |
  • None yet - considering using Clearinghouse to add quality scores to assemblies | Josephine Burgin, Joana Pauperio | Sanger | + + +
    + +## Curation objects +### Examples +Curations associated to an ENA Sample, Study, Run/Experiment and Sequence can be submitted to Clearinghouse. Below are some examples from different projects. +#### *Marine Metagenome Sample Curation:* +The following is third party annotation on https://www.ebi.ac.uk/ena/browser/view/SAMN08012645 +![Environmental metagenome curation example](./env_curation.png) +
    +#### *SARS-CoV-2 Sequence Curation:* +The following is third party annotation on https://www.ebi.ac.uk/ena/browser/view/OM635134 +![SARS-CoV-2 curation example](./covid_curation.png) +
    +
    +### What happens to curations after submission to Clearinghouse? + +For all public ENA/INSDC records, curations associated to them will automatically become visible alongside the record in the ENA browser, as below: + +#### *Marine Metagenome Sample Curation:* +![The EEZ-name derived from the latitude and longitude](./Image_Biosample_3rdPartyCuration.png) +
    +
    +For **any** sample related metadata curations, cases where both attribute name and value have been validated to be compliant with ENA Checklist fields will take priority for display, and appear at the top of the 3rd Party Curations table in the browser. See below: + +#### *SARS-CoV-2 Sequence Curation:* +![SARS-CoV-2 curation example](./covid_curation_browser.png) + + + +## Programmatically querying Clearinghouse data + +The Swagger API to the Clearinghouse ([here](https://www.ebi.ac.uk/ena/clearinghouse/api/swagger-ui/index.html#/)) allows one to do many types of query programmatically in production and [development](https://wwwdev.ebi.ac.uk/ena/clearinghouse/api/swagger-ui/index.html#/). This includes adding, modifying and removing curations, as well as querying the existing metadata in Clearinghouse, eg: +- querying via the ENA Sample ID (SAMEA####) - also known as 'recordID' to view all metadata curations associated to that sample. To date the highest proportion of Clearinghouse curations are associated to the sample record +- querying all records submitted by a particular group ('providerName') + +No account is needed for read access of the Clearinghouse API. +
    +You may find it useful to first obtain ENA accessions for Clearinghouse queries via the ENA's own [Advanced Search API](https://docs.google.com/document/d/1CwoY84MuZ3SdKYocqssumghBF88PWxUZ/edit) or [browser based Advanced Search](https://www.ebi.ac.uk/ena/browser/advanced-search). + +Currently more complex querying of Clearinghouse data would require you to bulk download the curations from the API and process the output further. + +For more technical information please refer to the [API documentation](https://docs.google.com/document/d/1y1a4xQwCddntDkmY3qq1XvxtMUZAtW0h3RhEMo3Gtho/edit#heading=h.1ksv4uv). +
    +
    + +### Tips for querying and submitting Clearinghouse data +Think carefully about what you want to do and why. If you are submitting curations, decide which ENA record object you wish to curate based on the existing emtadata for that record. Also consider whether you have sufficient evidence for these curations. +
    +
    +Essentially: +* Register for either an AAP or LifeScience ID, if you do not already have one. We suggest that you obtain credentials for both test and production +* Generate a bearer token +* Start by generating JSON files conforming to the Clearinghouse JSON format for a few test records +* Test submit these to the test instance of the Clearinghouse +* Explore retrieving these from the test instance of the Clearinghouse (e.g. to check curations are in the expected format) +* Then generate JSON annotations conforming to the Clearinghouse JSON format for all your curation records +* Submit these to the production instance of the Clearinghouse +* Log and examine the logs for an error and resubmit if necessary. (note: a small percentage of failures can be due timeouts, so a try/exception block in the submission scripts to wait and retry automatically can be useful) +* Explore retrieving a selection of those from the production instance of the Clearinghouse + +## How is using the Clearinghouse Different from Updating Records in ENA? + +It is important to differentiate between the curations submitted via the ELIXIR Clearinghouse and ENA-based metadata updates. + +* An ENA record update modifies the original public record, while a curation submitted to the Clearinghouse presents alongside the original record instead +* Only the original submitter of an ENA record can update this directly, while curations for a particular record can be submitted to the Clearinghouse by *any user* (as long as sufficient evidence is provided for how that curation was generated) +* An ENA record update requires Webin authentication, while curation submission/modification requires either AAP or LifeScience ID authentication instead +* As ENA record updates modify the original record, the modifications will propagate to EBI-based data portals (such as the Pathogens Portal, Early Cause, COVID-19 Data Portal) and be exchanged with other INSDC nodes. Curations submitted via Clearinghouse only present in the ENA browser and do not feed into other INSDC sites nor data portals. + + + +## Appendix: +### 1. [A template bash script for submission](clearinghouse_submission_template.sh) + diff --git a/submit/annotations/clearinghouse_submission_template.sh b/submit/annotations/clearinghouse_submission_template.sh new file mode 100755 index 0000000..c7a1b9f --- /dev/null +++ b/submit/annotations/clearinghouse_submission_template.sh @@ -0,0 +1,129 @@ +#!/usr/bin/env bash +# basic template bash script to run submit the clearinghouse submissions. This ought to run in most linux and unix(inc. MacOS) environments. +# This is just an example, you may well have better ways to do this. This template may help you implement your own in bash or indeed python, R or rust et cetera. +# N.B. please always refer to the API documentation, as that will be the most up to date. +# ENA, EMBL-EBI, January 2024 +# +echo " template script to run submit the clearinghouse submissions" +echo " run: script_name.py dir_path_to_submission_jsons" +# echo " suggestion, before the run: script submission_typescript.log as this will record the output" +# echo " please check the output for an errors, e.g. JSON to fix or resubmit" +echo "" + +################################################################################################## +##### Configurable portion +# This where one can set environmental variables and credentials up, that are used subsequently. + +# Taking the first argument to the script from the command line. It is the directory containing the JSON annotation files to be submitted. +submission_dir=$1 + +#need a plain file to write and read the bearer key from, see later in this script. It will be automatically created with the name below +export bearer_file="bearer_file" + +# this is my local bash (chmod 007 ~/.my_secrets), please contact your IT admin to see whether this is permissible or if they have a better solution +my_protected_secret_file="~/.my_secrets" +if ! [ -f ${my_protected_secret_file} ] ; then + echo "${my_protected_secret_file}<--does not exist, so exiting" + exit +fi + +source ${my_protected_secret_file} +#these below are the relevant contents of the configuration file, please uncomment and add your credentials +#export my_email_address='my_email_name@redbrick.ac.uk' +#export aai_test_user='my_test_username' +#export aai_test_pass='my_test_password' +#export aai_test_creds="${aai_test_user}:${aai_test_pass}" +# +#export aai_prod_user='my_prod_username' +#export aai_prod_pass='my_prod_password' +#export aai_prod_creds="${aai_prod_user}:${aai_prod_pass}" + +# set up the applicable server URL for the the test or production environment. +# it is by default set up to submit to the test=developmental server. +TEST=1 +if [ $TEST -eq 1 ]; then + echo "using test credentials and setup" + url="https://wwwdev.ebi.ac.uk/ena/clearinghouse/api/curations" + auth_url='https://explore.api.aai.ebi.ac.uk/auth' + creds=$aai_test_creds +else + #PROD + echo "using production credentials and setup" + url="https://www.ebi.ac.uk/ena/clearinghouse/api/curations" + auth_url='https://api.aai.ebi.ac.uk/auth' + creds=$aai_prod_creds +fi + +################################################################ +# Declaring Functions, that are called later in this bash script + +function re_run_bearer_file () { + auth_url=$1 + credentials=$2 + echo "curl $auth_url" -u "$credentials" + echo $bearer_file + curl "$auth_url" -u "$credentials" 1> $bearer_file 2>/dev/null + bearerkey=`cat $bearer_file` + len=${#bearerkey} + if [ $len -lt 100 ]; then + echo "Invalid bearer key, so exiting script, try later." + exit + fi + } + +function submit_2_clearinghouse () { + export curation_json_file=$1 + echo $curation_json_file + export bearerkey=`cat $bearer_file` + export bearer="Authorization: Bearer $bearerkey" + #echo $bearer + # -T is needed for big files -d @ is slightly faster and puts it into memory + cmd=`curl -X POST \"${url}\" -H \"accept: */*\" -H \"Content-Type: application/json\" -H \"${bearer}\" -d @${curation_json_file}` + #could not get to both see the command and execute it, so doing the dirty way via a new file + echo $cmd + echo $cmd > run_me.sh + time sh ./run_me.sh + echo " " +} + +######################################################################### +# configuration continued, this using the configurations and a function from above to +# 1) ensure that you have a submission directory +# 2) use the user name, password and applicable URL to create the bearer key. +# the bearer key identifies you to the applicable ClearingHouse server, so that you can submit the annotations. + +if [ ! -d $submission_dir ]; then + echo "${submission_dir}<--is not a valid directory, so exiting" + exit +fi +echo "submission_dir: -->"${submission_dir} + +# echo $creds +# if one needs, renew the bearer ( the below is for the test): +#curl 'https://explore.api.aai.ebi.ac.uk/auth' -u $aai_test2_creds 1> bearer_file 2>/dev/null +#re-running the bearer key frequently as it times out within an hour... + +re_run_bearer_file $auth_url $creds +export bearerkey=`cat $bearer_file` +export bearer="Authorization: Bearer $bearerkey" + +################################################################################################## +# now processing every json file in your submission directory +for file in $submission_dir/*.json +do + echo $file + echo $bearer + submit_2_clearinghouse $file + sleep 0.1 # wait 0.1 seconds + re_run_bearer_file $auth_url $creds +done +message_contents="Attempted to process all files in ${submission_dir} . Please do check the logfiles for any errors." +echo ${message_contents} | mail -s "ClearingHouse submission finished" ${my_email_address} +echo "end of script: ${0}" +################################################################################################ + + + + + + diff --git a/submit/annotations/covid_curation.png b/submit/annotations/covid_curation.png new file mode 100644 index 0000000..bddf7ff Binary files /dev/null and b/submit/annotations/covid_curation.png differ diff --git a/submit/annotations/covid_curation_browser.png b/submit/annotations/covid_curation_browser.png new file mode 100644 index 0000000..052847f Binary files /dev/null and b/submit/annotations/covid_curation_browser.png differ diff --git a/submit/annotations/env_curation.png b/submit/annotations/env_curation.png new file mode 100644 index 0000000..2d542e5 Binary files /dev/null and b/submit/annotations/env_curation.png differ diff --git a/submit/samples/SampleChecklists/SampleChecklistUpdates/2024-01-31:Incorporating_MIxS_V6.2.md b/submit/samples/SampleChecklists/SampleChecklistUpdates/2024-01-31:Incorporating_MIxS_V6.2.md new file mode 100644 index 0000000..6235a3f --- /dev/null +++ b/submit/samples/SampleChecklists/SampleChecklistUpdates/2024-01-31:Incorporating_MIxS_V6.2.md @@ -0,0 +1,79 @@ +# ENA Checklists Update Incorporating MIxS V6.2 +Checklists Updated: January 2024 + +## Summary of ENA Checklists after the MIxS v6.2 Update +* Four new MIxS checklists have been added to ENA: GSC MIxS Agriculture, GSC MIxS Food and Production, GSC MIxS Symbiont, and GSC MIxS Hydrocarbon. +* Fifteen existing MIxS checklists in ENA, had new checklists terms added. + * Three had many new terms: GSC MIxS built environment(66), GSC MIxS plant-associated(24) and GSC MIxS sediment(14). + * Twelve checklists had between 1 and 8 new terms added. +* 368 new MIxS terms were added to the ENA checklist system. There are now 1031 ENA sample checklist terms. +* 47 aliases(synonyms) of terms were added, e.g. where the MIxS term name had changed, or there was now a MIxS term for the same concept as an existing legacy ENA term. Wherever appropriate we use the MIxS term. + +This and similar metadata updates are important to both: +1. meet the needs of the diverse data submitters to ENA and +2. ensure interoperability for ENA submitted metadata with that of other INSDC members and other portals. Please see the background to sample checklists in ENA for more information. + +This will take effect from 31-Jan-2024. + +--- +## Introduction +[Please read this background about sample level checklists](../sample_checklist_introduction.md) and GSC MIxS. + +A growing proportion of ENA's sample level checklists are from MIxS, currently the MIxS are 22 of the 52 sample checklists. Most of the other sources of ENA’s checklists are legacy. + +## Four New MIxS Derived Checklists in ENA + +| New checklist Name in ENA | Deeper background to the checklist creation | Comment for ENA | +|---------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| **GSC MIxS Agriculture** | [Community-Driven Metadata Standards for Agricultural Microbiome Research](https://apsjournals.apsnet.org/doi/10.1094/PBIOMES-09-19-0051-P) | | +| **GSC MIxS Food and Production** | | Built from five MIxS lists packages as much overlap(food-human foods, food-farm environment, food-food production facility, food-animal and animal feed) N.B. A dozen terms are currently excluded, as they were mainly agriculture and or soil sample related. | +| **GSC MIxS Symbiont** | [MIxS-SA: a MIxS extension defining the minimum information standard for sequence data from symbiont-associated micro-organisms](https://www.nature.com/articles/s43705-022-00092-w) | | +| **GSC MIxS Hydrocarbon** | [MIxS-HCR: a MIxS extension defining a minimal information standard for sequence data from environments pertaining to hydrocarbon resources](https://www.nature.com/articles/s43705-022-00092-w) | All added apart from “additional info” | + +## Fifteen existing MIxS checklists in ENA have had new checklists terms added + +* For twelve checklists, between 1 and 8 new terms were added to these GSC MIxS checklists: air, host, human-associated, human-gut, human-oral, human-vaginal, microbial mat biofilm, miscellaneous natural or artificial environment, soil, wastewater sludge, and water +* For the following three checklists there was a more substantial addition: + * 66 terms being added to the **GSC MIxS built environment + * 24 terms added to the **GSC MIxS plant-associated + * 14 terms added to the **GSC MIxS sediment + +# Summary Tables of Terms counts and Terms added Existing Checklist + +## Summary Table of Terms ( all sample based ) +| Count | What | +|-------|--------------------------------------------------------------------------| +| 1031 | total terms now in ENA | +| 368 | new terms not in ENA were added from MIxS | +| 47 | aliases added | +| 16 | existing definitions updated | +| 3 | MIxS v6.2 terms were not added to ENA, such as "miscellaneous attribute" | + +## Table of Terms added to which checklist ( all sample based ) +Only listing the terms where there were additional terms to existing checklists. + + +| Checklist | New or existing | Comment | +|--------------------------------------------------------------------------------------------------|-------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| **GSC MIxS Agriculture** | New | N.B. From four or so MIxS packages | +| **GSC MIxS Food and Production** | New |
  • Combined from several MIxS lists as so much overlap
  • about a dozen terms, seemed out of place: agriculture and or soil looked better bets, so excluded those | +| **GSC MIxS Symbiont** | New | | +| **GSC MIxS Hydrocarbon** | New | All added apart from “additional info” | +| **GSC MIxS air** | existing | new terms added:
  • depth taxonomic
  • classification | +| **GSC MIxS built environment** | existing | new terms added:
    • outside relative humidity
    • presence of pets, animals, or insects
    • quadrant position
    • relative sampling location
    • room air exchange rate
    • room architectural elements
    • room condition
    • room count
    • room dimensions
    • room door distance
    • room location in building
    • room moisture damage or mold history
    • room net area
    • room occupancy
    • room sampling position
    • room type
    • room volume
    • room window count
    • rooms connected by a doorway
    • rooms that are on the same hallway
    • rooms that share a door with sampling room
    • rooms that share a wall with sampling room
    • sampling day weather
    • sampling floor
    • sampling room ID or name
    • sampling time outside
    • season
    • seasonal use
    • shading device condition
    • shading device location
    • shading device material
    • shading device signs of water/mold
    • shading device type
    • specific humidity
    • specifications
    • surface-air contaminant
    • taxonomic classification
    • temperature
    • temperature outside house
    • train line
    • train station collection location
    • train stop collection location
    • visual media
    • wall area
    • wall construction type
    • wall finish material
    • wall height
    • wall location
    • wall signs of water/mold
    • wall surface treatment
    • wall texture
    • wall thermal mass
    • water feature size
    • water feature type
    • weekday
    • window area/size
    • window condition
    • window covering
    • window horizontal position
    • window location
    • window material
    • window open frequency
    • window signs of water/mold
    • window status
    • window type
    • window vertical position
      • | +| **GSC MIxS host** | existing | new terms added:
        • ancestral data
        • biological status
        • genetic modification
        • observed host symbionts
        • sample capture status
        • sample collection device or method
        • sample disease stage
        • taxonomic classification
        | +| **GSC MIxS human-associated** | existing | new terms added:
        • altitude
        • depth
        • elevation
        • nose throat disorder
        • observed host symbionts
        • taxonomic classification
        | +| **GSC MIxS human-gut** | existing | new terms added:
        • altitude
        • depth
        • elevation
        • host scientific name
        • observed host symbionts
        • taxonomic classification
        | +| **GSC MIxS human-oral** | existing | new terms added:
        • depth
        • elevation
        • host scientific name
        • observed host symbionts
        • taxonomic classification
        | +| **GSC MIxS human-skin** | existing | new terms added:
        • altitude
        • depth
        • elevation
        • host scientific name
        • observed host symbionts
        • taxonomic classification
        | +| **GSC MIxS human-vaginal** | existing | new terms added:
        • altitude
        • depth
        • elevation
        • host scientific name
        | +| **GSC MIxS microbial mat biofilm** | existing | new terms added:
        • taxonomic classification
        • total nitrogen content
        | +| **GSC MIxS miscellaneous natural or artificial environment** | existing | new terms added:
        • taxonomic classification
        | +| **GSC MIxS plant-associated** | existing | new terms added:
        • ancestral data
        • biological status
        • biotic regimen
        • culture rooting medium
        • genetic modification
        • growth facility
        • growth habit
        • host scientific name
        • light regimen
        • observed host symbionts
        • plant growth medium
        • plant sex
        • plant structure
        • rooting conditions
        • rooting medium carbon
        • rooting medium macronutrients
        • rooting medium micronutrients
        • rooting medium organic supplements
        • rooting medium pH
        • rooting medium regulators
        • rooting medium solidifier
        • sample capture status
        • sample disease stage
        • taxonomic classification
        | +| **GSC MIxS sediment** | existing | new terms added:
        • alkalinity
        • mean friction velocity
        • mean peak friction velocity
        • pH
        • particle classification
        • porosity
        • pressure
        • sediment type
        • taxonomic classification
        • temperature
        • tidal stage
        • total depth of water column
        • total nitrogen content
        • turbidity
        | +| **GSC MIxS soil** | existing | new terms added:
        • host specificity or range
        • mean seasonal precipitation
        • mean seasonal temperature
        • organic nitrogen
        • taxonomic classification
        | +| **GSC MIxS wastewater sludge** | existing | new terms added:
        • altitude
        • elevation
        • taxonomic classification
        • total nitrogen concentration
        | +| **GSC MIxS water** | existing | new terms added:
        • alkalinity method
        • size-fraction lower threshold
        • size-fraction upper threshold
        • taxonomic classification
        • total nitrogen concentration
        | + +--- + \ No newline at end of file diff --git a/submit/samples/SampleChecklists/sample_checklist_introduction.md b/submit/samples/SampleChecklists/sample_checklist_introduction.md new file mode 100644 index 0000000..58ad31d --- /dev/null +++ b/submit/samples/SampleChecklists/sample_checklist_introduction.md @@ -0,0 +1,25 @@ +# Introduction + +Sample checklists are used to ensure that both the minimum core metadata and metadata specific to different sample types are submitted to ENA. Please see: [background to sample checklists in ENA](https://ena-browser-docs.readthedocs.io/en/latest/browser/sample-checklists.html) and the available [ENA sample checklists](https://www.ebi.ac.uk/ena/browser/checklists). + +The [Genome Standards Consortium(GSC)](http://www.gensc.org//pages/projects/mixs-gsc-project.html) works with many communities to generate the _“Minimum Information about any (X) Sequence” (MIxS) specifications_. ENA and other INSDC members implement the MIxS standards. Essentially these consist of: +* Community specific checklists, but with each having a core of shared metadata terms. +* Metadata terms of a specific name and definition. +* Sometimes a required pattern for the value, ranging from an integer to free text. + +## Working together on Improving Standards +As outlined above, ENA collaborates with [GSC](http://www.gensc.org//pages/projects/mixs-gsc-project.html), [INSDC](https://www.insdc.org/) and other standards bodies to help meet our increasingly diverse user needs and increase interoperability. The sequence technologies continue to evolve at pace and scientists apply them to help investigate basic biology, disease and biodiversity. + +There are some considerations with these standards especially in that the actual implementation varies in different organisations. Generally we try to minimise the differences to increase interoperability. Here are some examples: +* In ENA, we use the **long term name**(called "title" in GSC MIxS) rather than the **short term name**. This is because some of the short names are ambiguous abbreviations, so the longer names provide more clarity. +* In MIxS, many of the checklists are called **combinations**; these consist of **core** terms and **extension** terms. In ENA, a small subset of these terms(e.g. taxonomy) will not be in the sample checklist as they are handled separately. +* In ENA, some terms have broader concepts than the MIxS e.g. we use **depth** term more generally rather than just **soil depth** we also use the same term to cover **depth below sea level** +* There are several MIxS terms such as **miscellaneous attribute**, which are not used in the ENA checklists, as they are ambiguous and not interoperable. +We do regularly mutually share suggested changes to definitions, term naming or additional terms. + +## Time Scales of Updates +We try to get the balance of being stable and predictable, whilst still being responsive enough to meet the needs of communities. +* Generally ENA and other INSDC members commit to checklist updates following the major MIxS releases e.g. 4.0, 5.0, 6.0, 7.0. These are typically every 2 to 3 years. + * Updates, even with much automation can take many weeks of full time equivalent work to add and quality control. + * Sometimes terms change names and then change back again between sub-releases. +* If important terms, improved term definitions or even checklists are needed by ENA's user communities, we often promptly add those in.