In short, a tool to support the Resident Corrections Analyst program through bespoke functions written in R and data sets for learning purposes. Interested? You can install is using R:
remotes::install_github("cllghn/rcahelpr")Codebooks are often to provide a comprehensive guide to the variables and coding schemes in a data set, ensuring consistent and accurate interpretation of data. They serve as a reference tool to facilitate understanding and analysis of complex data structures by researchers and analysts. To support this crucial functionality, we include a codebook function into this package:
# Load the library
library(rcahelpr)
# Create a test data set
test <- data.frame(
"person" = c("chris", "maeve", "joseph", "brooks"),
"org" = c("csg", "wdoc", "ccjbh", "asu"),
"years_in_org" = c(1, 0.3, 0.2, NA),
"role" = as.factor(c("mentor", "rca", "rca", "mentor")),
"date" = as.Date(c("2020-01-01", "2020-01-01", NA, "2020-01-02"))
)
# Create codebook
make_codebook(input_df = test, return_df = FALSE, escape = FALSE)| Variable Name | Data Class | Valid Values | Statistics | Unique Values | Missing Values |
|---|---|---|---|---|---|
| person | Character | Unique strings (n=4): chris, maeve, joseph, and more. |
4 unique strings, top three: brooks (n=1) chris (n=1) joseph (n=1) |
4 | 0 (0%) |
| org | Character | Unique strings (n=4): csg, wdoc, ccjbh, and more. |
4 unique strings, top three: asu (n=1) ccjbh (n=1) csg (n=1) |
4 | 0 (0%) |
| years_in_org | Numeric | Numeric range from 0.2 to 1. |
Min: 0.2 Avg: 0.5 Median: 0.3 Max: 1 SD: 0.44 |
4 | 1 (25%) |
| role | Factor | Categorical variable with 2 levels: mentor, rca | 2 Unique factors: mentor, rca | 2 | 0 (0%) |
| date | Date | Date rage from 2020-01-01 to 2020-01-02. |
Min: 2020-01-01 Mode: 2020-01-01 Max: 2020-01-02 Time difference: 1 days |
3 | 1 (25%) |
Should you want to add information to describe the variables described in the codebook, you can do so by left joining and additional data set:
# Set a secondary data.frame describing the variables in your original data set
more <- data.frame(
"vars" = c("person", "org", "years_in_org", "role"),
"description" = rep("Interesting details about my variable.", 4),
"origin" = rep("Detailed notes on where the data came from.", 4),
"notes" = rep("Yet more useful information", 4)
)
# Create codebook
make_codebook(input_df = test, return_df = FALSE, escape = FALSE,
extra_vars = more, extra_key = "vars")| Variable Name | Data Class | Valid Values | Statistics | Unique Values | Missing Values | Description | Origin | Notes |
|---|---|---|---|---|---|---|---|---|
| date | Date | Date rage from 2020-01-01 to 2020-01-02. |
Min: 2020-01-01 Mode: 2020-01-01 Max: 2020-01-02 Time difference: 1 days |
3 | 1 (25%) | NA | NA | NA |
| org | Character | Unique strings (n=4): csg, wdoc, ccjbh, and more. |
4 unique strings, top three: asu (n=1) ccjbh (n=1) csg (n=1) |
4 | 0 (0%) | Interesting details about my variable. | Detailed notes on where the data came from. | Yet more useful information |
| person | Character | Unique strings (n=4): chris, maeve, joseph, and more. |
4 unique strings, top three: brooks (n=1) chris (n=1) joseph (n=1) |
4 | 0 (0%) | Interesting details about my variable. | Detailed notes on where the data came from. | Yet more useful information |
| role | Factor | Categorical variable with 2 levels: mentor, rca | 2 Unique factors: mentor, rca | 2 | 0 (0%) | Interesting details about my variable. | Detailed notes on where the data came from. | Yet more useful information |
| years_in_org | Numeric | Numeric range from 0.2 to 1. |
Min: 0.2 Avg: 0.5 Median: 0.3 Max: 1 SD: 0.44 |
4 | 1 (25%) | Interesting details about my variable. | Detailed notes on where the data came from. | Yet more useful information |
Some key data sets, for learning or general use have been included in
the library. Access them using the :: accessor:
str(rcahelpr::hpsa_primarycare)## 'data.frame': 230 obs. of 17 variables:
## $ HPSA_Discipline_Class : chr "Primary Care" "Primary Care" "Primary Care" "Primary Care" ...
## $ HPSA_Name : chr "Low Income - MSSA 78.2ddd/Bell SW/Cudahy/Maywood/V" "MSSA 6/Pioneer" "MSSA 78.2uuu/Athens" "MSSA 137/Isleton" ...
## $ HPSA_ID : chr "1061017434" "1061018308" "1061038158" "1061081242" ...
## $ County_Equivalent_Name : chr "Los Angeles" "Amador" "Los Angeles" "Sacramento" ...
## $ Designation_Type : chr "HPSA Population" "Geographic HPSA" "High Needs Geographic HPSA" "Geographic HPSA" ...
## $ HPSA_Population_Type : chr "Low Income Population HPSA" "Geographic Population" "Geographic Population" "Geographic Population" ...
## $ HPSA_Score : int 13 16 18 9 12 15 10 9 19 11 ...
## $ PC_MCTA_Score : int NA NA NA NA NA 18 NA 13 NA NA ...
## $ HPSA_Provider_Ratio_Goal : chr "3000:1" "3500:1" "3000:1" "3500:1" ...
## $ HPSA_FTE : num 0.16 0.1 3.75 0.95 9.26 3.2 1.75 27.5 4 0 ...
## $ HPSA_Designation_Population : int 53040 5848 84994 5597 39476 17795 13687 101329 54088 7045 ...
## $ HPSA_Formal_Ratio : chr "331500:1" "58480:1" "22665:1" "5892:1" ...
## $ HPSA_Shortage : num 17.52 1.57 24.58 0.65 3.9 ...
## $ HPSA_Status : chr "Proposed For Withdrawal" "Proposed For Withdrawal" "Proposed For Withdrawal" "Proposed For Withdrawal" ...
## $ HPSA_Designation_Date : chr "9/12/2011" "7/11/2008" "10/9/2012" "5/13/2008" ...
## $ HPSA_Designation_Last_Update_Dat: chr "9/10/2021" "9/10/2021" "5/20/2022" "9/10/2021" ...
## $ Data_Warehouse_Record_Create_Dat: chr "1/17/2023" "1/17/2023" "1/17/2023" "1/17/2023" ...
This means that we can use these data for a range of purposes, such as pairing them with other libraries and analyzing it:
# Load the graphing library ggplot2 and data management library dplyr
library(dplyr)
library(forcats)
library(ggplot2)
# Wrangle some data to identify the average Health Professional Shortage Area
# (HPSA) score in a given county:
demo <- rcahelpr::hpsa_primarycare %>%
group_by(County_Equivalent_Name) %>%
summarize(mean_hpsa_score = mean(HPSA_Score),
total_hpsa_population = sum(HPSA_Designation_Population))
# Make a beautiful graph
ggplot(data = demo) +
geom_point(aes(x = total_hpsa_population , y = fct_rev(County_Equivalent_Name),
color = mean_hpsa_score)) +
geom_segment(aes(x = 0 , xend = total_hpsa_population ,
y = County_Equivalent_Name, yend = County_Equivalent_Name,
color = mean_hpsa_score)) +
theme_minimal() +
labs(title = "Affected Populations in HPSA by County",
subtitle = "HPSA scores determine priorities for the assignment of clinitians",
caption = "Data from CCHS") +
xlab("Total Population in all County HPSAs") +
ylab("") +
scale_color_gradient2(low="#F5F5DC", mid = "#FFA500", high="#8B0000",
midpoint = mean(demo$mean_hpsa_score),
name = "Average County HPSA Score") +
theme(legend.position = "bottom")Also, the data and functions in this package can be combined:
make_codebook(input_df = rcahelpr::hpsa_primarycare, return_df = FALSE,
escape = FALSE)| Variable Name | Data Class | Valid Values | Statistics | Unique Values | Missing Values |
|---|---|---|---|---|---|
| HPSA_Discipline_Class | Character | Unique strings: Primary Care. | 1 Unique strings: Primary Care | 1 | 0 (0%) |
| HPSA_Name | Character | Unique strings (n=230): Low Income - MSSA 78.2ddd/Bell SW/Cudahy/Maywood/V, MSSA 6/Pioneer, MSSA 78.2uuu/Athens, and more. |
230 unique strings, top three: Colusa County (n=1) LI-MFW-MSSA 176b/East Palo Alto (n=1) LI-MFW/MSSA 186 Anderson (n=1) |
230 | 0 (0%) |
| HPSA_ID | Character | Unique strings (n=230): 1061017434, 1061018308, 1061038158, and more. |
230 unique strings, top three: 1061017434 (n=1) 1061018308 (n=1) 1061038158 (n=1) |
230 | 0 (0%) |
| County_Equivalent_Name | Character | Unique strings (n=52): Los Angeles, Amador, Sacramento, and more. |
52 unique strings, top three: Los Angeles (n=42) San Bernardino (n=15) Kern (n=14) |
52 | 0 (0%) |
| Designation_Type | Character | Unique strings: HPSA Population, Geographic HPSA, High Needs Geographic HPSA. | 3 Unique strings: HPSA Population, Geographic HPSA, High Needs Geographic HPSA | 3 | 0 (0%) |
| HPSA_Population_Type | Character | Unique strings (n=6): Low Income Population HPSA, Geographic Population, Low Income Migrant Farmworker Population HPSA, and more. |
6 unique strings, top three: Geographic Population (n=125) Low Income Population HPSA (n=46) Medicaid Eligible Population HPSA (n=31) |
6 | 0 (0%) |
| HPSA_Score | Integer | Numeric range from 4 to 20. |
Min: 4 Avg: 13.03 Median: 13 Max: 20 SD: 3.32 |
17 | 0 (0%) |
| PC_MCTA_Score | Integer | Numeric range from 1 to 22. |
Min: 1 Avg: 13.47 Median: 14 Max: 22 SD: 4.7 |
22 | 97 (42%) |
| HPSA_Provider_Ratio_Goal | Character | Unique strings: 3000:1, 3500:1. | 2 Unique strings: 3000:1, 3500:1 | 2 | 0 (0%) |
| HPSA_FTE | Numeric | Numeric range from 0 to 43.14. |
Min: 0 Avg: 5.71 Median: 2.04 Max: 43.14 SD: 7.93 |
162 | 0 (0%) |
| HPSA_Designation_Population | Integer | Numeric range from 748 to 173639. |
Min: 748 Avg: 35656.43 Median: 25296 Max: 173639 SD: 33617.59 |
230 | 0 (0%) |
| HPSA_Formal_Ratio | Character | Unique strings (n=173): 331500:1, 58480:1, 22665:1, and more. |
173 unique strings, top three: (n=52) 3553:1 (n=2) 3556:1 (n=2) |
173 | 0 (0%) |
| HPSA_Shortage | Numeric | Numeric range from 0.01 to 30.15. |
Min: 0.01 Avg: 5.86 Median: 3.18 Max: 30.15 SD: 6.6 |
216 | 0 (0%) |
| HPSA_Status | Character | Unique strings: Proposed For Withdrawal, Designated. | 2 Unique strings: Proposed For Withdrawal, Designated | 2 | 0 (0%) |
| HPSA_Designation_Date | Character | Unique strings (n=159): 9/12/2011, 7/11/2008, 10/9/2012, and more. |
159 unique strings, top three: 6/22/2022 (n=9) 1/31/2022 (n=6) 3/14/2022 (n=6) |
159 | 0 (0%) |
| HPSA_Designation_Last_Update_Dat | Character | Unique strings (n=50): 9/10/2021, 5/20/2022, 8/27/2021, and more. |
50 unique strings, top three: 9/10/2021 (n=118) 3/30/2022 (n=8) 6/22/2022 (n=8) |
50 | 0 (0%) |
| Data_Warehouse_Record_Create_Dat | Character | Unique strings: 1/17/2023. | 1 Unique strings: 1/17/2023 | 1 | 0 (0%) |
