Skip to content

Add a new dataset

Ben Bond-Lamberty edited this page Apr 9, 2020 · 1 revision

Handling metadata submission

Preliminaries

  • Open up the submission in the Google Form
  • Make sure they've clicked "Yes" to the first four questions
  • Create a branch with the dataset name: generally "d" + date + last name (e.g. d20200407_WANG)
  • Create a new issue with the dataset name, and give it a "data" label
  • Copy (not move) the /inst/extdata/TEMPLATE/ folder into inst/extdata/datasets and rename it to the dataset name
  • If using Outlook open the email template is in /misc/COSORE submission - [questions and] upload link for $DATASET.emltpl. Open it; dataset name goes in subject line

Site description and contributors

  • As you're working through the following, fill in any questions in the email. Common question: complex experimental design (ask them to explain fully), no publications listed (confirm), measurement instrument not given or unclear,
  • Open the DESCRIPTION.txt, CONTRIBUTORS.txt and PORTS.txt files
  • Fill in the CONTRIBUTORS.txt entries from the "Contributor(s)" section of the form
  • Fill in the first part of DESCRIPTION.txt from the entries in the "Site" section of the form. Note exceptions: "Primary species present" goes into PORTS.txt, and "Ecosystem age" goes into ANCILLARY.csv
  • Fill in last part of DESCRIPTION.txt from the entries in the "Publications" section of the form
  • Under "Measurement protocols", "Measurement instrument", "Measurement length", and the two timestamp questions go into DESCRIPTION.txt

Measurement conditions

  • Fill in various fields in PORTS.txt from the rest of the "Measurement protocols" section. Currently Yes/No map to TRUE/FALSE - should standardize this
  • Look at ancillary data questions and keep in mind

Final steps

  • Create a new Dropbox file request with the dataset name; copy the upload link to the email
  • Make sure email has PI name, email, site name, etc., correctly filled in and send
  • Note in GitHub issue "Metadata, upload link sent." or similar language
  • Open RStudio project file. Build package. The new dataset should be listed at the end of the list_datasets() output. Confirm that read_dataset() will parse it; fix any errors if not
  • Make a commit message of "Metadata for #xxx" (fill in the issue number)
  • Push and open a PR if you want
  • Note at this point you can use csr_report_dataset() to generate a report-may be handy to check map, etc.; but I usually do this only after data ingest