{rcahelpr}

❓ What and Why is This?

In short, a tool to support the Resident Corrections Analyst program through bespoke functions written in R and data sets for learning purposes. Interested? You can install is using R:

remotes::install_github("cllghn/rcahelpr")

🔎 Examples

📚 Codebooks

Codebooks are often to provide a comprehensive guide to the variables and coding schemes in a data set, ensuring consistent and accurate interpretation of data. They serve as a reference tool to facilitate understanding and analysis of complex data structures by researchers and analysts. To support this crucial functionality, we include a codebook function into this package:

# Load the library
library(rcahelpr)

# Create a test data set
test <- data.frame(
  "person" = c("chris", "maeve", "joseph", "brooks"),
  "org" = c("csg", "wdoc", "ccjbh", "asu"),
  "years_in_org" = c(1, 0.3, 0.2, NA),
  "role" = as.factor(c("mentor", "rca", "rca", "mentor")),
  "date" = as.Date(c("2020-01-01", "2020-01-01", NA, "2020-01-02"))
)

# Create codebook
make_codebook(input_df = test, return_df = FALSE, escape = FALSE)

Variable Name	Data Class	Valid Values	Statistics	Unique Values	Missing Values
person	Character	Unique strings (n=4): chris, maeve, joseph, and more.	4 unique strings, top three: brooks (n=1) chris (n=1) joseph (n=1)	4	0 (0%)
org	Character	Unique strings (n=4): csg, wdoc, ccjbh, and more.	4 unique strings, top three: asu (n=1) ccjbh (n=1) csg (n=1)	4	0 (0%)
years_in_org	Numeric	Numeric range from 0.2 to 1.	Min: 0.2 Avg: 0.5 Median: 0.3 Max: 1 SD: 0.44	4	1 (25%)
role	Factor	Categorical variable with 2 levels: mentor, rca	2 Unique factors: mentor, rca	2	0 (0%)
date	Date	Date rage from 2020-01-01 to 2020-01-02.	Min: 2020-01-01 Mode: 2020-01-01 Max: 2020-01-02 Time difference: 1 days	3	1 (25%)

Should you want to add information to describe the variables described in the codebook, you can do so by left joining and additional data set:

# Set a secondary data.frame describing the variables in your original data set
more <- data.frame(
  "vars" = c("person", "org", "years_in_org", "role"),
  "description" = rep("Interesting details about my variable.", 4),
  "origin" = rep("Detailed notes on where the data came from.", 4),
  "notes" = rep("Yet more useful information", 4)
)

# Create codebook
make_codebook(input_df = test, return_df = FALSE, escape = FALSE,
              extra_vars = more, extra_key = "vars")

Variable Name	Data Class	Valid Values	Statistics	Unique Values	Missing Values	Description	Origin	Notes
date	Date	Date rage from 2020-01-01 to 2020-01-02.	Min: 2020-01-01 Mode: 2020-01-01 Max: 2020-01-02 Time difference: 1 days	3	1 (25%)	NA	NA	NA
org	Character	Unique strings (n=4): csg, wdoc, ccjbh, and more.	4 unique strings, top three: asu (n=1) ccjbh (n=1) csg (n=1)	4	0 (0%)	Interesting details about my variable.	Detailed notes on where the data came from.	Yet more useful information
person	Character	Unique strings (n=4): chris, maeve, joseph, and more.	4 unique strings, top three: brooks (n=1) chris (n=1) joseph (n=1)	4	0 (0%)	Interesting details about my variable.	Detailed notes on where the data came from.	Yet more useful information
role	Factor	Categorical variable with 2 levels: mentor, rca	2 Unique factors: mentor, rca	2	0 (0%)	Interesting details about my variable.	Detailed notes on where the data came from.	Yet more useful information
years_in_org	Numeric	Numeric range from 0.2 to 1.	Min: 0.2 Avg: 0.5 Median: 0.3 Max: 1 SD: 0.44	4	1 (25%)	Interesting details about my variable.	Detailed notes on where the data came from.	Yet more useful information

📊 Data

Some key data sets, for learning or general use have been included in the library. Access them using the :: accessor:

str(rcahelpr::hpsa_primarycare)

## 'data.frame':    230 obs. of  17 variables:
##  $ HPSA_Discipline_Class           : chr  "Primary Care" "Primary Care" "Primary Care" "Primary Care" ...
##  $ HPSA_Name                       : chr  "Low Income - MSSA 78.2ddd/Bell SW/Cudahy/Maywood/V" "MSSA 6/Pioneer" "MSSA 78.2uuu/Athens" "MSSA 137/Isleton" ...
##  $ HPSA_ID                         : chr  "1061017434" "1061018308" "1061038158" "1061081242" ...
##  $ County_Equivalent_Name          : chr  "Los Angeles" "Amador" "Los Angeles" "Sacramento" ...
##  $ Designation_Type                : chr  "HPSA Population" "Geographic HPSA" "High Needs Geographic HPSA" "Geographic HPSA" ...
##  $ HPSA_Population_Type            : chr  "Low Income Population HPSA" "Geographic Population" "Geographic Population" "Geographic Population" ...
##  $ HPSA_Score                      : int  13 16 18 9 12 15 10 9 19 11 ...
##  $ PC_MCTA_Score                   : int  NA NA NA NA NA 18 NA 13 NA NA ...
##  $ HPSA_Provider_Ratio_Goal        : chr  "3000:1" "3500:1" "3000:1" "3500:1" ...
##  $ HPSA_FTE                        : num  0.16 0.1 3.75 0.95 9.26 3.2 1.75 27.5 4 0 ...
##  $ HPSA_Designation_Population     : int  53040 5848 84994 5597 39476 17795 13687 101329 54088 7045 ...
##  $ HPSA_Formal_Ratio               : chr  "331500:1" "58480:1" "22665:1" "5892:1" ...
##  $ HPSA_Shortage                   : num  17.52 1.57 24.58 0.65 3.9 ...
##  $ HPSA_Status                     : chr  "Proposed For Withdrawal" "Proposed For Withdrawal" "Proposed For Withdrawal" "Proposed For Withdrawal" ...
##  $ HPSA_Designation_Date           : chr  "9/12/2011" "7/11/2008" "10/9/2012" "5/13/2008" ...
##  $ HPSA_Designation_Last_Update_Dat: chr  "9/10/2021" "9/10/2021" "5/20/2022" "9/10/2021" ...
##  $ Data_Warehouse_Record_Create_Dat: chr  "1/17/2023" "1/17/2023" "1/17/2023" "1/17/2023" ...

This means that we can use these data for a range of purposes, such as pairing them with other libraries and analyzing it:

# Load the graphing library ggplot2 and data management library dplyr
library(dplyr)
library(forcats)
library(ggplot2)

# Wrangle some data to identify the average Health Professional Shortage Area 
# (HPSA) score in a given county:
demo <- rcahelpr::hpsa_primarycare %>%
  group_by(County_Equivalent_Name) %>%
  summarize(mean_hpsa_score = mean(HPSA_Score),
            total_hpsa_population = sum(HPSA_Designation_Population))

# Make a beautiful graph
ggplot(data = demo) +
  geom_point(aes(x = total_hpsa_population , y = fct_rev(County_Equivalent_Name),
                 color = mean_hpsa_score)) +
  geom_segment(aes(x = 0 , xend = total_hpsa_population ,
                   y = County_Equivalent_Name, yend = County_Equivalent_Name,
                   color = mean_hpsa_score)) +
  theme_minimal() +
  labs(title = "Affected Populations in HPSA by County",
       subtitle = "HPSA scores determine priorities for the assignment of clinitians",
       caption = "Data from CCHS") +
  xlab("Total Population in all County HPSAs") +
  ylab("") +
  scale_color_gradient2(low="#F5F5DC", mid = "#FFA500", high="#8B0000", 
                        midpoint = mean(demo$mean_hpsa_score),
                        name = "Average County HPSA Score") +
  theme(legend.position = "bottom")

Also, the data and functions in this package can be combined:

make_codebook(input_df = rcahelpr::hpsa_primarycare, return_df = FALSE, 
              escape = FALSE)

Variable Name	Data Class	Valid Values	Statistics	Unique Values	Missing Values
HPSA_Discipline_Class	Character	Unique strings: Primary Care.	1 Unique strings: Primary Care	1	0 (0%)
HPSA_Name	Character	Unique strings (n=230): Low Income - MSSA 78.2ddd/Bell SW/Cudahy/Maywood/V, MSSA 6/Pioneer, MSSA 78.2uuu/Athens, and more.	230 unique strings, top three: Colusa County (n=1) LI-MFW-MSSA 176b/East Palo Alto (n=1) LI-MFW/MSSA 186 Anderson (n=1)	230	0 (0%)
HPSA_ID	Character	Unique strings (n=230): 1061017434, 1061018308, 1061038158, and more.	230 unique strings, top three: 1061017434 (n=1) 1061018308 (n=1) 1061038158 (n=1)	230	0 (0%)
County_Equivalent_Name	Character	Unique strings (n=52): Los Angeles, Amador, Sacramento, and more.	52 unique strings, top three: Los Angeles (n=42) San Bernardino (n=15) Kern (n=14)	52	0 (0%)
Designation_Type	Character	Unique strings: HPSA Population, Geographic HPSA, High Needs Geographic HPSA.	3 Unique strings: HPSA Population, Geographic HPSA, High Needs Geographic HPSA	3	0 (0%)
HPSA_Population_Type	Character	Unique strings (n=6): Low Income Population HPSA, Geographic Population, Low Income Migrant Farmworker Population HPSA, and more.	6 unique strings, top three: Geographic Population (n=125) Low Income Population HPSA (n=46) Medicaid Eligible Population HPSA (n=31)	6	0 (0%)
HPSA_Score	Integer	Numeric range from 4 to 20.	Min: 4 Avg: 13.03 Median: 13 Max: 20 SD: 3.32	17	0 (0%)
PC_MCTA_Score	Integer	Numeric range from 1 to 22.	Min: 1 Avg: 13.47 Median: 14 Max: 22 SD: 4.7	22	97 (42%)
HPSA_Provider_Ratio_Goal	Character	Unique strings: 3000:1, 3500:1.	2 Unique strings: 3000:1, 3500:1	2	0 (0%)
HPSA_FTE	Numeric	Numeric range from 0 to 43.14.	Min: 0 Avg: 5.71 Median: 2.04 Max: 43.14 SD: 7.93	162	0 (0%)
HPSA_Designation_Population	Integer	Numeric range from 748 to 173639.	Min: 748 Avg: 35656.43 Median: 25296 Max: 173639 SD: 33617.59	230	0 (0%)
HPSA_Formal_Ratio	Character	Unique strings (n=173): 331500:1, 58480:1, 22665:1, and more.	173 unique strings, top three: (n=52) 3553:1 (n=2) 3556:1 (n=2)	173	0 (0%)
HPSA_Shortage	Numeric	Numeric range from 0.01 to 30.15.	Min: 0.01 Avg: 5.86 Median: 3.18 Max: 30.15 SD: 6.6	216	0 (0%)
HPSA_Status	Character	Unique strings: Proposed For Withdrawal, Designated.	2 Unique strings: Proposed For Withdrawal, Designated	2	0 (0%)
HPSA_Designation_Date	Character	Unique strings (n=159): 9/12/2011, 7/11/2008, 10/9/2012, and more.	159 unique strings, top three: 6/22/2022 (n=9) 1/31/2022 (n=6) 3/14/2022 (n=6)	159	0 (0%)
HPSA_Designation_Last_Update_Dat	Character	Unique strings (n=50): 9/10/2021, 5/20/2022, 8/27/2021, and more.	50 unique strings, top three: 9/10/2021 (n=118) 3/30/2022 (n=8) 6/22/2022 (n=8)	50	0 (0%)
Data_Warehouse_Record_Create_Dat	Character	Unique strings: 1/17/2023.	1 Unique strings: 1/17/2023	1	0 (0%)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

{rcahelpr}

❓ What and Why is This?

🔎 Examples

📚 Codebooks

📊 Data

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

{rcahelpr}

❓ What and Why is This?

🔎 Examples

📚 Codebooks

📊 Data