GitHub - bansell/tidyrstats: Tidy helpers for common statistical functions in R

Description

This package contains helper functions for implementing common statistical functions in R, within tidy workflows.

Install

if (!requireNamespace("devtools", quietly = TRUE)) {
  install.packages("devtools")
}
devtools::install_github("bansell/tidyrstats")

library(tidyrstats)

neg_log()

This function computes the negative logarithm of a numeric input using base 10 by default. This function returns the negative logarithm of x. By default, it uses base 10, but you can specify a different base using the base argument. Designed for quickly transforming p values for statistical analysis.

Say we have summary statistics results for an RNAseq differential expression experiment

head(toptags)
#> # A tibble: 6 × 7
#>   geneID logFC AveExpr     t  P.Value    adj.P.Val     B
#>   <chr>  <dbl>   <dbl> <dbl>    <dbl>        <dbl> <dbl>
#> 1 gene_1 -4.10    7.07 -35.5 3.38e-12 0.0000000466  18.4
#> 2 gene_2 -4.35    5.66 -32.4 8.66e-12 0.0000000466  17.4
#> 3 gene_3 -4.72    6.46 -30.8 1.45e-11 0.0000000466  17.1
#> 4 gene_4 -3.77    6.29 -30.0 1.93e-11 0.0000000466  16.9
#> 5 gene_5 -5.22    4.93 -30.9 1.42e-11 0.0000000466  16.8
#> 6 gene_6 -5.31    8.86 -29.6 2.22e-11 0.0000000466  16.7

We can -log10 transform the P.Value column

toptags |> 
  # -log10 transform the P.Value column
  mutate(neg_logP = neg_log(P.Value)) 
#> # A tibble: 16,624 × 8
#>    geneID  logFC AveExpr     t  P.Value    adj.P.Val     B neg_logP
#>    <chr>   <dbl>   <dbl> <dbl>    <dbl>        <dbl> <dbl>    <dbl>
#>  1 gene_1  -4.10    7.07 -35.5 3.38e-12 0.0000000466  18.4     11.5
#>  2 gene_2  -4.35    5.66 -32.4 8.66e-12 0.0000000466  17.4     11.1
#>  3 gene_3  -4.72    6.46 -30.8 1.45e-11 0.0000000466  17.1     10.8
#>  4 gene_4  -3.77    6.29 -30.0 1.93e-11 0.0000000466  16.9     10.7
#>  5 gene_5  -5.22    4.93 -30.9 1.42e-11 0.0000000466  16.8     10.8
#>  6 gene_6  -5.31    8.86 -29.6 2.22e-11 0.0000000466  16.7     10.7
#>  7 gene_7  -3.82    6.70 -29.4 2.34e-11 0.0000000466  16.7     10.6
#>  8 gene_8   5.19    3.75  29.0 2.72e-11 0.0000000466  16.2     10.6
#>  9 gene_9  -5.66    4.42 -29.4 2.36e-11 0.0000000466  16.1     10.6
#> 10 gene_10 -5.38    6.30 -28.0 3.90e-11 0.0000000526  16.1     10.4
#> # ℹ 16,614 more rows

…then create a volcano plot:

toptags |> 
  # -log10 transform the P.Value column
  mutate(neg_logP = neg_log(P.Value)) |> 
  # volcano plot
  ggplot(aes(x=logFC,y=neg_logP)) + 
  geom_point(size=0.5)

scale_this()

This function scales and centres a numeric vector by subtracting the mean and dividing by the standard deviation.

Unlike base::scale(), it returns a numeric vector, not a matrix, making it play well with tidy data structures and dplyr chained code. Note this function does not allow control over centering or scaling at present.

scale_this` will drop NA values before scaling, by default (as is the case for base::scale()). However the function will include original NA values in the output, and generate a warning message.

iris_dat <- c(head(iris$Sepal.Length),NA)

iris_dat |>  as_tibble() |> mutate(scaled = scale_this(value))
#> Input contains NA values. These are removed before scaling.
#> # A tibble: 7 × 2
#>   value scaled
#>   <dbl>  <dbl>
#> 1   5.1  0.521
#> 2   4.9 -0.174
#> 3   4.7 -0.868
#> 4   4.6 -1.21 
#> 5   5    0.174
#> 6   5.4  1.56 
#> 7  NA   NA

scale_this() will also work with a simple numeric vector:

scale_this(iris_dat)
#> Input contains NA values. These are removed before scaling.
#> [1]  0.5206576 -0.1735525 -0.8677627 -1.2148677  0.1735525  1.5619728         NA

lm_test()

This function provides a wrapper around do(broom::tidy(lm(... , data = .)) which is very difficult to teach in a tidyverse-focused workshop; difficult for students to understand, as well as annoying to code!

lm_test() is compatible with grouped or nested input, and compatible with the native R pipe |> or magrittr pipe %>%. It accepts a quoted or unquoted model formula.

lm_test() returns a tibble containing the tidied output of lm(), sorted by p-value, which includes the following columns:

term: The name of the model term (such as the intercept, predictors, or interaction terms).
estimate: The estimated coefficient (also known as the beta value).
std.error: The standard error associated with the estimate.
statistic: The t-statistic used in hypothesis testing.
p.value: The p-value indicating the significance of the term.

If the input data is grouped or nested, the output will retain the group identifiers. In both cases, the grouping/nesting identifiers will appear in the left-most columns of the tibble.

For ungrouped input:

mpg %>% lm_test( cty ~ hwy )
#> Results for linear model: cty ~ hwy
#> # A tibble: 2 × 5
#>   term      estimate std.error statistic   p.value
#>   <chr>        <dbl>     <dbl>     <dbl>     <dbl>
#> 1 hwy          0.683    0.0138     49.6  1.87e-125
#> 2 intercept    0.844    0.333       2.53 1.19e-  2

For grouped input:

mpg %>% group_by(class) %>%  lm_test(cty ~ hwy * cyl + 0)
#> Results for linear model: cty ~ hwy * cyl + 0
#> # A tibble: 21 × 6
#>    class      term    estimate std.error statistic  p.value
#>    <chr>      <chr>      <dbl>     <dbl>     <dbl>    <dbl>
#>  1 suv        hwy       0.878     0.0329     26.7  1.26e-34
#>  2 compact    hwy       0.818     0.0396     20.6  3.03e-24
#>  3 midsize    hwy       0.799     0.0423     18.9  6.64e-21
#>  4 pickup     hwy       0.863     0.0399     21.6  7.46e-20
#>  5 subcompact hwy       0.817     0.0548     14.9  5.87e-16
#>  6 minivan    hwy       0.826     0.0510     16.2  2.13e- 7
#>  7 suv        hwy:cyl  -0.0437    0.0120     -3.65 5.59e- 4
#>  8 suv        cyl       0.424     0.150       2.83 6.37e- 3
#>  9 midsize    hwy:cyl  -0.0579    0.0295     -1.96 5.72e- 2
#> 10 2seater    hwy       0.353     0.132       2.68 7.48e- 2
#> # ℹ 11 more rows

For nested input with a quoted model formula:

mpg |>  nest_by(class) |>  lm_test("cty ~ hwy * cyl + 0")
#> Results for linear model: cty ~ hwy * cyl + 0
#> # A tibble: 21 × 6
#>    class      term    estimate std.error statistic  p.value
#>    <chr>      <chr>      <dbl>     <dbl>     <dbl>    <dbl>
#>  1 suv        hwy       0.878     0.0329     26.7  1.26e-34
#>  2 compact    hwy       0.818     0.0396     20.6  3.03e-24
#>  3 midsize    hwy       0.799     0.0423     18.9  6.64e-21
#>  4 pickup     hwy       0.863     0.0399     21.6  7.46e-20
#>  5 subcompact hwy       0.817     0.0548     14.9  5.87e-16
#>  6 minivan    hwy       0.826     0.0510     16.2  2.13e- 7
#>  7 suv        hwy:cyl  -0.0437    0.0120     -3.65 5.59e- 4
#>  8 suv        cyl       0.424     0.150       2.83 6.37e- 3
#>  9 midsize    hwy:cyl  -0.0579    0.0295     -1.96 5.72e- 2
#> 10 2seater    hwy       0.353     0.132       2.68 7.48e- 2
#> # ℹ 11 more rows

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
R		R
README_files/figure-gfm		README_files/figure-gfm
_vignettes		_vignettes
man		man
.DS_Store		.DS_Store
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.md		README.md
tidyrstats.Rproj		tidyrstats.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Description

Install

neg_log()

scale_this()

lm_test()

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Description

Install

neg_log()

scale_this()

lm_test()

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages