jhudsl · carriewright11 · Jun 10, 2025 · Jun 10, 2025 · Jun 10, 2025 · Jun 10, 2025
diff --git a/modules/Data_Cleaning/Data_Cleaning.Rmd b/modules/Data_Cleaning/Data_Cleaning.Rmd
@@ -74,16 +74,16 @@ Types of "missing" data:
 ## Finding Missing data {.small}
 
 -   `is.na` - looks for `NAN` and `NA`
--   `is.nan`- looks for `NAN`
 -   `is.infinite` - looks for Inf or -Inf
 
+```{r, echo=FALSE}
+NA_vect<- c(0,NA, -1)
+NA_vect <- NA_vect/0
+```
+
 ```{r}
-test <- c(0,NA, -1)
-test/0
-test <- test/0
-is.na(test)
-is.nan(test)
-is.infinite(test)
+is.na(NA_vect)
+is.infinite(NA_vect)
 ```
 
 
@@ -92,8 +92,8 @@ is.infinite(test)
 `any()` can help you check if there are any `NA` values in a vector
 
 ```{r}
-test
-any(is.na(test))
+NA_vect
+any(is.na(NA_vect))
 ```
 
 
@@ -378,7 +378,7 @@ B. include `& is.na()`
 
 ## Summary
 
--   `is.na()`,`any(is.na())`, `all(is.na())`,`count()`, and functions from `naniar` like `gg_miss_var()` and `miss_var_summary` can help determine if we have `NA` values
+-   `is.na()`,`any(is.na())`, `count()`, and functions from `naniar` like `gg_miss_var()` and `miss_var_summary` can help determine if we have `NA` values
 -   `miss_var_which()` can help you drop columns that have any missing values.
 -   `filter()` automatically removes `NA` values - can't confirm or deny
     if condition is met (need `| is.na()` to keep them)
@@ -387,7 +387,7 @@ B. include `& is.na()`
 -   `NA` values can change your calculation results
 -   think about what `NA` values represent - don't drop them if you shouldn't
 -   `na_if()` will make `NA` values for a particular value
--   `replace_na()` will replace `NA values with a particular value
+-   `replace_na()` will replace `NA` values with a particular value
 
 ## Lab Part 1
 
@@ -512,15 +512,15 @@ Note that automatically values not reassigned explicitly by
 {data_input} %>%
   mutate({variable_to_fix} = case_when({Variable_fixing}   
              /some condition/ ~ {value_for_con},
-                         TRUE ~ {value_for_not_meeting_condition})
+                          .default = {value_for_not_meeting_condition})
 
 ```
 :::
 
 {value_for_not_meeting_condition} could be something new 
 or it can be the original values of the column
 
-## case_when with TRUE ~ original variable name
+## case_when with .default = original variable name
 
 ```{r}
 data_ginger_mint %>% 
@@ -529,7 +529,7 @@ data_ginger_mint %>%
                                Treatment == "Mint" ~ "Peppermint",
                                Treatment == "mint" ~ "Peppermint",
                                Treatment == "peppermint" ~ "Peppermint",
-                                TRUE ~ Treatment)) %>%
+                                .default = Treatment)) %>%
   count(Treatment, Treatment_recoded)
 ```
 
@@ -544,35 +544,23 @@ data_ginger_mint %>%
                                Treatment == "Mint" ~ "Peppermint",
                                Treatment == "mint" ~ "Peppermint",
                                Treatment == "peppermint" ~ "Peppermint",
-                               TRUE ~ Treatment)) %>%
+                               .default =  Treatment)) %>%
   count(Treatment, Treatment_recoded)
 ```
 
 
-## But maybe we want NA?
-
-Perhaps we want values that are O or Other to actually be NA, then `case_when` can be helpful for this. We simply specify everything else.
 
-```{r}
-data_ginger_mint %>% 
-  mutate(Treatment_recoded = case_when(
-                        Treatment == "Ginger" ~ "Ginger", 
-                        Treatment == "Mint" ~ "Peppermint",
-                        Treatment == "mint" ~ "Peppermint",
-                        Treatment == "peppermint" ~ "Peppermint")) %>%
-  count(Treatment, Treatment_recoded)
-```
 ## case_when() can also overwrite/update a variable
 
 You need to specify what we want in the first part of `mutate`.
 
 ```{r}
 data_ginger_mint %>% 
   mutate(Treatment = case_when(
-                          Treatment == "Ginger" ~ "Ginger", 
                           Treatment == "Mint" ~ "Peppermint",
                           Treatment == "mint" ~ "Peppermint",
-                          Treatment == "peppermint" ~ "Peppermint")) %>%
+                          Treatment == "peppermint" ~ "Peppermint",
+                          .default = Treatment)) %>%
   count(Treatment)
 
 ```
@@ -584,16 +572,29 @@ data_ginger_mint %>%
 ```{r}
 data_ginger_mint %>% 
   mutate(Treatment_recoded = case_when(
-    Treatment == "Ginger" ~ "Ginger", # keep it the same!
     Treatment %in% 
 c("Mint", "mint", "Peppermint", "peppermint") ~ "Peppermint",
-    Treatment %in% c("O", "Other") ~ "Other")) %>%
+    Treatment %in% c("O", "Other") ~ "Other",
+   .default = Treatment)) %>%
 
   count(Treatment, Treatment_recoded)
 
 ```
 
+## But maybe we want NA?
 
+Perhaps we want values that are O or Other to actually be NA, then `case_when` can be helpful for this. We could specify everything else and drop `.default = Treatment` or we could specify NA directly with `NA_character_`
+
+```{r}
+data_ginger_mint %>% 
+  mutate(Treatment_recoded = case_when(
+    Treatment %in% 
+c("Mint", "mint", "Peppermint", "peppermint") ~ "Peppermint",
+    Treatment %in% c("O", "Other") ~ NA_character_,
+ .default = Treatment)) %>%
+
+  count(Treatment, Treatment_recoded)
+```
 
 ## Another reason for `case_when()`
 
@@ -619,13 +620,26 @@ data_ginger_mint %>%
   count(Group, Effect)
 ```
 
-## GUT CHECK: If we want all unspecified values to remain the same with `case_when()`, how should we complete the `TRUE ~` statement?
+## GUT CHECK: If we want all unspecified values to remain the same with `case_when()`, how should we complete the `.default =` statement?
 
 A. With the name of the variable we are modifying or using as source
 
 B. With the word "same"
 
 
+## Other Functions/Arguments you might see
+
+`.default = ` used to be `TRUE ~`
+
+```{r}
+data_ginger_mint %>% 
+  mutate(Treatment_recoded = case_when(
+    Treatment %in% 
+     c("Mint", "mint", "Peppermint", "peppermint") ~ "Peppermint",
+    Treatment %in% c("O", "Other") ~ NA_character_,
+ TRUE ~ Treatment)) %>%
+```
+
 # Working with strings
 
 ## Strings in R
@@ -726,19 +740,18 @@ data_ginger_mint %>%
     Treatment %in% 
      c("Mint", "mint", "Peppermint", "peppermint") ~ "Peppermint",
     Treatment %in% c("O", "Other") ~ "Other",
-    TRUE ~ Treatment))
+    .default = Treatment))
 ```
 
 ## `case_when()` improved with `stringr`
-`^` indicates the beginning of a character string
-`$` indicates the end
+
 
 ```{r}
 data_ginger_mint %>% 
   mutate(Treatment_recoded = case_when(
     str_detect(string = Treatment, pattern = "int") ~ "Peppermint",
-    str_detect(string = Treatment, pattern = "^o|^O") ~ "Other",
-    TRUE ~ Treatment)) %>%
+    str_detect(string = Treatment, pattern = "o|O") ~ "Other",
+    .default = Treatment)) %>%
   count(Treatment, Treatment_recoded)
 ```
 
@@ -759,6 +772,37 @@ B. `str_find()`
 
 C. `str_detect()`
 
+## A bit on Regular Expressions
+
+-   <http://www.regular-expressions.info/reference.html>
+-   They can use to match a large number of strings in one statement
+-   `.` matches any single character
+-   `*` means repeat as many (even if 0) more times the last character
+-   `?` makes the last thing optional
+-   `^` matches start of vector `^a` - starts with "a"
+-   `$` matches end of vector `b$` - ends with "b"
+
+
+## Things you might see
+
+`^` indicates the beginning of a character string (placed before)
+`$` indicates the end (placed after)
+
+```{r}
+data_ginger_mint %>% 
+  mutate(Treatment_recoded = case_when(
+    str_detect(string = Treatment, pattern = "t$") ~ "Peppermint",
+    .default = Treatment)) %>%
+  count(Treatment, Treatment_recoded)
+
+data_ginger_mint %>% 
+  mutate(Treatment_recoded = case_when(
+ str_detect(string = Treatment, pattern = "^m") ~ "Mint Tea",
+    .default = Treatment)) %>%
+  count(Treatment, Treatment_recoded)
+
+```
+
 # Separating and uniting data
 
 ## Uniting columns 
@@ -790,11 +834,10 @@ data_comb <- data_comb %>%
 data_comb
 ```
 
-
 ## Summary
  -  `case_when()` requires `mutate()` when working with dataframes/tibbles
 -   `case_when()` can recode **entire values** based on **conditions** (need quotes for conditions and new values)
-    -   remember `case_when()` needs `TRUE ~ varaible` to keep values that aren't specified by conditions, otherwise will be `NA`
+    -   remember `case_when()` needs `.default = varaible` to keep values that aren't specified by conditions, otherwise will be `NA`
 
 **Note:** you might see the `recode()` function, it only does some of what `case_when()` can do, so we skipped it, but it is in the extra slides at the end.
 
@@ -825,36 +868,16 @@ knitr::include_graphics("images/case_when.png")
 
 📃 [Posit's `stringr` Cheatsheet](https://evoldyn.gitlab.io/evomics-2018/ref-sheets/R_strings.pdf)
 
-```{r, fig.alt="The End", out.width = "50%", echo = FALSE, fig.align='center'}
-knitr::include_graphics(here::here("images/the-end-g23b994289_1280.jpg"))
-```
-
-Image by <a href="https://pixabay.com/users/geralt-9301/?utm_source=link-attribution&amp;utm_medium=referral&amp;utm_campaign=image&amp;utm_content=812226">Gerd Altmann</a> from <a href="https://pixabay.com//?utm_source=link-attribution&amp;utm_medium=referral&amp;utm_campaign=image&amp;utm_content=812226">Pixabay</a>
-
-# Extra Slides
+📃 [Posit's `dplyr` Cheatsheet](https://rstudio.github.io/cheatsheets/data-transformation.pdf)
 
-## `recode()` function
-
-This is similar to `case_when()` but it can't do as much.
-
-::: {style="color: red;"}
-(need `mutate` for data frames/tibbles!)
-:::
-::: codeexample
-```{r, eval = FALSE}
-# General Format - this is not code!
-{data_input} %>%
-  mutate({variable_to_fix_or_new} = recode({Variable_fixing}, {old_value} = {new_value},
-                                       {another_old_value} = {new_value}))
 
+```{r, fig.alt="The End", out.width = "50%", echo = FALSE, fig.align='center'}
+knitr::include_graphics(here::here("images/the-end-g23b994289_1280.jpg"))
 ```
-:::
 
-## recode() function
+## Things you might see
 
-::: {style="color: red;"}
-Need quotes for new values! Tolerates quotes for old values.
-:::
+`recode` function - not as powerful as `case_when`
 
 ```{r, eval = FALSE}
 
@@ -909,15 +932,6 @@ y
 length(y)
 ```
 
-## A bit on Regular Expressions
-
--   <http://www.regular-expressions.info/reference.html>
--   They can use to match a large number of strings in one statement
--   `.` matches any single character
--   `*` means repeat as many (even if 0) more times the last character
--   `?` makes the last thing optional
--   `^` matches start of vector `^a` - starts with "a"
--   `$` matches end of vector `b$` - ends with "b"
 
 ## Let's look at modifiers for `stringr`
 

diff --git a/modules/Data_Cleaning/lab/Data_Cleaning_Lab_Key.Rmd b/modules/Data_Cleaning/lab/Data_Cleaning_Lab_Key.Rmd
@@ -148,7 +148,7 @@ NEW_TIBBLE <- OLD_TIBBLE %>%
   mutate(NEW_COLUMN = case_when(
     OLD_COLUMN %in% c( ... ) ~ ... ,
     OLD_COLUMN %in% c( ... ) ~ ... ,
-    TRUE ~ OLD_COLUMN
+    .default = OLD_COLUMN
   ))
 ```
 
@@ -158,7 +158,7 @@ BloodType <- BloodType %>%
   mutate(exposure = case_when(
     exposure %in% c("N", "n", "No", "no") ~ "No",
     exposure %in% c("Y", "y", "Yes", "yes") ~ "Yes",
-    TRUE ~ exposure # the only other value is an NA so we could include this or we don't need to (it's generally good practice unless we want to create NAs)
+    .default = exposure # the only other value is an NA so we could include this or we don't need to (it's generally good practice unless we want to create NAs)
   ))
 
 count(BloodType, exposure)
@@ -181,7 +181,7 @@ BloodType <- BloodType %>%
   mutate(type = case_when(
     type == "o.-" ~ "O.-",
     type == "o.+" ~ "O.+",
-    TRUE ~ type))
+    .default = type))
 BloodType
 ```
 

diff --git a/modules/Manipulating_Data_in_R/Manipulating_Data_in_R.Rmd b/modules/Manipulating_Data_in_R/Manipulating_Data_in_R.Rmd
@@ -14,7 +14,7 @@ library(tidyverse)
 
 ## Recap of Data Cleaning
 
--   `is.na()`,`any(is.na())`, `all(is.na())`,`count()`, and functions from `naniar` like `gg_miss_var()` and `miss_var_summary` can help determine if we have `NA` values
+-   `is.na()`,`any(is.na())`, `count()`, and functions from `naniar` like `gg_miss_var()` and `miss_var_summary` can help determine if we have `NA` values
 -   `miss_var_which()` can help you drop columns that have any missing values.
 -   `filter()` automatically removes `NA` values
 -   `drop_na()` can help you remove `NA` values
@@ -28,8 +28,8 @@ library(tidyverse)
     -   remember `case_when()` needs `TRUE ~ variable` to keep values that aren't specified by conditions, otherwise will be `NA`
 -   `stringr` package has great functions for looking for specific **parts of values** especially `filter()` and `str_detect()` combined
     - also has other useful string manipulation functions like `str_replace()` and more!
-    - `separate()` can split columns into additional columns
-    - `unite()` can combine columns
+- `separate()` can split columns into additional columns
+- `unite()` can combine columns
 
 📃[Day 5 Cheatsheet](https://jhudatascience.org/intro_to_r/modules/cheatsheets/Day-5.pdf)
 

diff --git a/modules/cheatsheets/Day-5.md b/modules/cheatsheets/Day-5.md
@@ -20,7 +20,7 @@ number) by 0.
 |`naniar`| [`pct_complete(x)`](https://www.rdocumentation.org/packages/naniar/versions/0.6.1/topics/pct_complete)|`pct_complete(x)`| Reports the percentage of data that is complete in `x`. |
 |`naniar`| [`gg_miss_var(x)`](https://www.rdocumentation.org/packages/naniar/versions/0.6.1/topics/gg_miss_var)|`gg_miss_var(x)`| Reports as a plot the percentage of data that is complete in `x`. |
 |`tidyr`| [`drop_na(df)`](https://tidyr.tidyverse.org/reference/drop_na.html)|`drop_na(df)`| Drops rows of `NA` from a given data frame/tibble |
-| `dplyr`| [`case_when()`](https://dplyr.tidyverse.org/reference/case_when.html)| `df <- df %>% mutate(variable_recoded = case_when(variable > 2 ~ "large", TRUE ~ variable) `|This function allows you to recode data based on certain conditions.  If no cases match, NA is returned, unless the TRUE statement specifies otherwise.|
+| `dplyr`| [`case_when()`](https://dplyr.tidyverse.org/reference/case_when.html)| `df <- df %>% mutate(variable_recoded = case_when(variable > 2 ~ "large", .default = variable) `|This function allows you to recode data based on certain conditions.  If no cases match, NA is returned, unless the .default statement specifies otherwise.|
 | `dplyr`| [`mutate()`](https://www.rdocumentation.org/packages/dplyr/versions/0.7.8/topics/mutate)| `df <- mutate(df, newcol = wt/2.2)`| Adds a new column that is a function of existing columns|
 | `dplyr`| [`separate()`](https://tidyr.tidyverse.org/reference/separate.html)| `df %>% separate(x, c("A", "B"))`| Separate a character column into multiple columns with a regular expression or numeric locations|
 | `dplyr`| [`unite()`](https://tidyr.tidyverse.org/reference/unite.html)| `df %>% unite("z", x:y, remove = FALSE)`| Unite multiple columns together into one column|