This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
ghedata is an R data package developed by Global Health Engineering at ETH Zurich for sharing data resources that document the group's work. It provides datasets suitable for research, teaching, and learning purposes.
# Load development dependencies
library(devtools)
library(usethis)
library(pkgdown)
# Document the package (regenerate man/ files from roxygen comments)
devtools::document()
# Run R CMD check
devtools::check()
# Install the package locally for testing
devtools::install()
# Build the pkgdown website
pkgdown::build_site()
# Run data processing script to update datasets
source("data-raw/data_processing.R")# Load the package in development mode
devtools::load_all()
# Test specific dataset
library(ghedata)
data(people)
View(people)
# Check if documentation is properly built
?people- Raw Data Source: Google Sheets (accessed via googlesheets4)
- Processing Script:
data-raw/data_processing.R- Pulls data from configured Google Sheets
- Performs data cleaning and anonymization
- Saves processed data to
data/as .rda files - Exports to
inst/extdata/as CSV/XLSX
-
R/: Dataset documentation files (e.g.,
people.R)- Each dataset has a corresponding R file with roxygen2 documentation
- Documents include title, description, format, source, and examples
-
data-raw/: Contains processing scripts and intermediate data
- Main processing happens in
data_processing.R - Dictionary files define data schemas
- Main processing happens in
-
inst/extdata/: Exported data files for download
- Provides CSV and XLSX versions of all datasets
- Accessible via
system.file()calls
- Uses roxygen2 for function/data documentation
- pkgdown generates the package website from:
- README.md (homepage)
- Function documentation (reference)
- Vignettes (articles)
- NEWS.md (changelog)
- Website configuration in
_pkgdown.yml
- Data Anonymization: Names are replaced with unique hash IDs to protect privacy
- Multiple Export Formats: Data available as R objects, CSV, and Excel files
- Google Sheets Integration: Raw data pulled directly from shared sheets
- Versioning: Package version tracks data updates (currently 0.0.5)
- Add raw data source to
data-raw/data_processing.R - Create documentation file in
R/(e.g.,R/newdata.R) - Run processing script to generate data files
- Update package documentation with
devtools::document() - Export to CSV/XLSX in processing script
- Update README with new dataset information
- Rebuild pkgdown site
- 2 spaces for indentation (no tabs)
- Auto-append newlines to files
- Strip trailing whitespace
- Follow tidyverse style guide for R code
- Do not display results by using print() or message() functions
- When writing markdown text, always add an empty row between a heading and the first paragraph or the first bullet of a list