Pathogen information is divided into taxonimic pathogens, PrPSc prions and toxin data detailed in the following sections.
ncbi-pathogens-data contains a set of XML files with information about taxonomic pathogens.
For each pathogen, the sub-species are identified. For each species, sub-species, the files contain several lists of PubMed Identifiers (PMIDs) depending where the source of the identifier. If the pathogen was found in the MeSH indexing of MEDLINE, the MeSHPMID
is used, if the pathogen is found in indexed in PubMed Central, the PMCID
tag is used, if the pathogen is found in GenBank, the tag GenBankPMID
is used.
In addition, several XML sets are used to store information about the synonyms and scientific names and the MeSH tree codes (if known).
The snippet below shows the example for Escherichia coli.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<pathogen>
<subSpecies>
<id>2822132</id>
<scientificName>Escherichia coli O128ac:H12</scientificName>
</subSpecies>
...
<PMCID>8185355</PMCID>
<PMCID>8185337</PMCID>
<PMCID>8185306</PMCID>
...
<GenBankPMID>26579110</GenBankPMID>
<GenBankPMID>23190765</GenBankPMID>
<GenBankPMID>1977705</GenBankPMID>
...
<MeSHPMID>34404473</MeSHPMID>
<MeSHPMID>34397035</MeSHPMID>
<MeSHPMID>34395312</MeSHPMID>
...
<id>562</id>
<meSHTree>B03.440.450.425.325.300, B03.660.250.150.180.100</meSHTree>
<otherNames><OtherNames>
<Synonym>
Bacillus coli
</Synonym>
...
<scientificName>Escherichia coli</scientificName>
</pathogen>
prpsc-ncbi-data contains a set of XML files with information collected about PrPSc prions. The files contain a list of PMIDs recovered for that prion from MEDLINE in the MeSHPMID
tag, the PubMed Query used to recover the PIMDs in pubMedQuery
and the species in the species
tag. It contains as well a terms.txt file with the list of prions.
The snippet below shows the example for cat PrPSc.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<PrPSc>
<MeSHPMID>19622059</MeSHPMID>
<MeSHPMID>19335885</MeSHPMID>
<MeSHPMID>15763182</MeSHPMID>
...
<pubMedQuery>"PrPSc Proteins"[MH] AND "cats"[MH]</pubMedQuery>
<species>cat</species>
</PrPSc>
toxin-ncbi-data contains a set of XML files with information collected about toxins. The files contain a list of PMIDs recovered for that toxin from MEDLINE in the MeSHPMID
tag and the name of the toxin in the name
tag. It contains a list of toxins in the terms.txt file.
The snippet below shows the example for saxitoxin.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<toxin>
<MeSHPMID>33916687</MeSHPMID>
<MeSHPMID>33526188</MeSHPMID>
<MeSHPMID>33526182</MeSHPMID>
...
<name>Saxitoxin</name>
</toxin>