episcout provides helper functions for cleaning, exploring and visualising large datasets. It wraps common preprocessing and descriptive tasks so you can focus on analysis. The package builds on the tidyverse and data.table ecosystems for fast and flexible data manipulation.
- Cleaning –
epi_clean_*functions tidy raw data and detect issues such as duplicates or inconsistent labels. - Statistics –
epi_stats_*functions create summary tables and descriptive statistics in a single call. - Plotting –
epi_plot_*wrappers make it straightforward to produce common graphs with ggplot2 and cowplot. - Utilities –
epi_utils_*helpers cover tasks like parallel processing and logging.
Install from GitHub:
install.packages("devtools")
library(devtools)
install_github("AntonioJBT/episcout")Functions are grouped by purpose, e.g.: epi_clean_* for data wrangling/cleanup. epi_stats_* for generating descriptive statistics and contingency tables. epi_plot_* for plotting (wrappers around ggplot2 and cowplot). epi_utils_* for utilities such as parallel processing, logging, etc. Miscellaneous helpers such as epi_read/epi_write.
This is a basic example of things you can do with episcout:
library(episcout)
# A data frame:
n <- 20
df <- data.frame(var_id = rep(1:(n / 2), each = 2),
var_to_rep = rep(c('Pre', 'Post'), n / 2),
x = rnorm(n),
y = rbinom(n, 1, 0.50),
z = rpois(n, 2)
)
# Print the first few rows and last few rows:
dim(df)
epi_head_and_tail(df, rows = 2, cols = 2)
epi_head_and_tail(df, rows = 2, cols = 2, last_cols = TRUE)
# Get all duplicates:
check_dups <- epi_clean_get_dups(df, 'var_id', 1)
dim(check_dups)
check_dups
# Get summary descriptive statistics for numeric/integer column:
num_vec <- df$x
desc_stats <- epi_stats_numeric(num_vec)
class(desc_stats)
lapply(desc_stats, class)
desc_stats
# And many more functions for cleaning, stats and plotting that do things a bit faster or more conveniently and I couldn't easily find in other packages.-
Pull requests welcome!
If you have any issues, pull requests, etc. please report them in the issue tracker.
-
Version 0.1.4 Added
epi_plot_theme_imssand colour palette helpers. Newepi_plot_add_var_labelslayer. Rewrittenepi_stats_*summary functions. -
Version 0.1.3 Improved coverage tests, added a few wrappers, slightly improved documentation
-
Version 0.1.2 Minor bug fixes and internal improvements
-
Version 0.1.1 First release