The paper:
Uprety Y, Asselin H, Dhakal A, Julien N. 2012. Traditional use of medicinal plants in the boreal forests of Canada: review and perspectives. J Ethnobiol Ethnomed. 2012;9:7. doi: 10.1186/1746-4269-8-7.
... contains a fantastic dataset in the supplementary data about the traditional medicinal use and vernacular names of plants in Canada. The paper (and thus supplementary data) are published under Creative Commons Attribution. The data however are provided as a Word file and thus not readily usable.
During the 4-day course of the #BIH13 conference, we will attempt to transform the data to a usable CSV file and link the data up with the Database of Vascular Plants of Canada (VASCAN), in which @peterdesmet is involved.
We managed to translate the dataset into a Darwin Core Archive, within the timeframe of the conference. See "Steps" below for the full details.
-
Copy/paste the Word table to a CSV file.
-
Get the data for one record (a taxon) on one line. [script]
-
Fix some formatting (mostly manually). [script]
-
Run the
scientificName
through the GBIF name parser and try to match the returnedgenus
,specificEpithet
,infraspecificEpithet
andtaxonRank
with data from VASCAN. [script]Of the
545
names,493
had one exact match,48
no match, and4
several matches. We tried to explain the mismatches here. -
Realize that there are too many vernacular name languages (14) and especially used plant parts (129) to express this in a flat file. Express as a Darwin Core Archive instead. [target format file]
-
Express the
scientificName
,family
and mapping to VASCAN in a Taxon Core. We also included the non-DwC term_habit
. -
Express the vernacular names in a VernacularName extension. [script]. Languages are mapped to their ISO 639-3 code (the ISO 693-1 code as requested in
dwc:language
does not capture all languages). [mapping file] We also included the non-DwC term_languageName
. -
Express the traditional medicinal use in a Description extension. Currently, the full description includes parts, uses and sources, but is not marked up as HTML. We also included the non-DwC term
_plantPart
, which are reconciled. [mapping file] -
Add a
meta.xml
file. [file] -
Catch up on sleep.