Skip to content

Commit f9cc75b

Browse files
histograms in vignette
1 parent 9992db4 commit f9cc75b

File tree

1 file changed

+42
-0
lines changed

1 file changed

+42
-0
lines changed

vignettes/datasummary.Rmd

+42
Original file line numberDiff line numberDiff line change
@@ -551,6 +551,48 @@ datasummary(flipper_length_mm + body_mass_g ~ species * (Mean + SD),
551551
add_columns = new_cols)
552552
```
553553

554+
# Histograms
555+
556+
The `datasummary` family of functions allow users to display in-line spark-style histograms to describe the distribution of the variables. For example, the `datasummary_skim` produces such a histogram:
557+
558+
```{r}
559+
tmp <- mtcars[, c("mpg", "hp")]
560+
datasummary_skim(tmp)
561+
```
562+
563+
Each of the histograms in the table above is actually an SVG image, produced by the `kableExtra` package. For this reason, the histogram will *not* appear when users use a different output backend, such as `gt`, `flextable`, or `huxtable`.
564+
565+
The `datasummary` function is incredibly flexible, but it does not include a histogram option by default. Here is a simple example of how one can customize the output of `datasummary`. We proceed in 4 steps:
566+
567+
1. Normalize the variables and store them in a list
568+
2. Create the table with `datasummary`, making sure to include 2 "empty" columns. In the example, we use a simple function called `emptycol` to fill those columns with empty strings.
569+
3. Add the histograms or boxplots using functions from the `kableExtra` package.
570+
571+
```{r}
572+
library(kableExtra)
573+
574+
tmp <- mtcars[, c("mpg", "hp")]
575+
576+
# create a list with individual variables
577+
# remove missing and rescale
578+
tmp_list <- lapply(tmp, na.omit)
579+
tmp_list <- lapply(tmp_list, scale)
580+
581+
# create a table with `datasummary`
582+
# add a histogram with column_spec and spec_hist
583+
# add a boxplot with colun_spec and spec_box
584+
emptycol = function(x) " "
585+
datasummary(mpg + hp ~ Mean + SD + Heading("Histogram") * emptycol + Heading("Boxplot") * emptycol, data = tmp) %>%
586+
column_spec(column = 4, image = spec_boxplot(tmp_list)) %>%
587+
column_spec(column = 5, image = spec_hist(tmp_list))
588+
```
589+
590+
If you want a simpler solution, you can try the `Histogram` function which works in `datasummary` automatically and comes bundled with `modelsummary`. The downside of this function is that it uses Unicode characters to create the histogram. This kind of histogram may not display well with certain typefaces or on some operating systems (Windows!).
591+
592+
```{r}
593+
datasummary(mpg + hp ~ Mean + SD + Histogram, data = tmp)
594+
```
595+
554596
# Missing values
555597

556598
At least 3 distinct issues can arise related to missing values.

0 commit comments

Comments
 (0)