Skip to content

Update tidy-data.Rmd #1558

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions vignettes/tidy-data.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -322,13 +322,16 @@ It's also common to find data values about a single type of observational unit s

3. Combine all tables into a single table.

Purrr makes this straightforward in R. The following code generates a vector of file names in a directory (`data/`) which match a regular expression (ends in `.csv`). Next we name each element of the vector with the name of the file. We do this because will preserve the names in the following step, ensuring that each row in the final data frame is labeled with its source. Finally, `map_dfr()` loops over each path, reading in the csv file and combining the results into a single data frame.
Purrr makes this straightforward in R. The following theoretical code generates a vector of file names from a directory (`data/`) which match a regular expression (ends in `.csv`). Next we name each element of the vector with the name of the file. We do this because we will preserve the names in the following step, ensuring that each row in the final data frame is labeled with its source. Finally, `map()` loops over each path, reading in the csv file, and `list_rbind()` combines the results into a single data frame.

```{r, eval = FALSE}
library(purrr)
library(readr)

paths <- dir("data", pattern = "\\.csv$", full.names = TRUE)
names(paths) <- basename(paths)
map_dfr(paths, read.csv, stringsAsFactors = FALSE, .id = "filename")

map(paths, read_csv) %>% list_rbind(names_to = "filename")
```

Once you have a single table, you can perform additional tidying as needed. An example of this type of cleaning can be found at <https://github.com/hadley/data-baby-names> which takes 129 yearly baby name tables provided by the US Social Security Administration and combines them into a single file.
Expand Down
Loading