Skip to content

cllghn/rcahelpr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

38 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

{rcahelpr}

❓ What and Why is This?

In short, a tool to support the Resident Corrections Analyst program through bespoke functions written in R and data sets for learning purposes. Interested? You can install is using R:

remotes::install_github("cllghn/rcahelpr")

πŸ”Ž Examples

πŸ“š Codebooks

Codebooks are often to provide a comprehensive guide to the variables and coding schemes in a data set, ensuring consistent and accurate interpretation of data. They serve as a reference tool to facilitate understanding and analysis of complex data structures by researchers and analysts. To support this crucial functionality, we include a codebook function into this package:

# Load the library
library(rcahelpr)

# Create a test data set
test <- data.frame(
  "person" = c("chris", "maeve", "joseph", "brooks"),
  "org" = c("csg", "wdoc", "ccjbh", "asu"),
  "years_in_org" = c(1, 0.3, 0.2, NA),
  "role" = as.factor(c("mentor", "rca", "rca", "mentor")),
  "date" = as.Date(c("2020-01-01", "2020-01-01", NA, "2020-01-02"))
)

# Create codebook
make_codebook(input_df = test, return_df = FALSE, escape = FALSE)
Variable Name Data Class Valid Values Statistics Unique Values Missing Values
person Character Unique strings (n=4): chris, maeve, joseph, and more. 4 unique strings, top three:
brooks (n=1)
chris (n=1)
joseph (n=1)
4 0 (0%)
org Character Unique strings (n=4): csg, wdoc, ccjbh, and more. 4 unique strings, top three:
asu (n=1)
ccjbh (n=1)
csg (n=1)
4 0 (0%)
years_in_org Numeric Numeric range from 0.2 to 1. Min: 0.2
Avg: 0.5
Median: 0.3
Max: 1
SD: 0.44
4 1 (25%)
role Factor Categorical variable with 2 levels: mentor, rca 2 Unique factors: mentor, rca 2 0 (0%)
date Date Date rage from 2020-01-01 to 2020-01-02. Min: 2020-01-01
Mode: 2020-01-01
Max: 2020-01-02
Time difference: 1 days
3 1 (25%)

Should you want to add information to describe the variables described in the codebook, you can do so by left joining and additional data set:

# Set a secondary data.frame describing the variables in your original data set
more <- data.frame(
  "vars" = c("person", "org", "years_in_org", "role"),
  "description" = rep("Interesting details about my variable.", 4),
  "origin" = rep("Detailed notes on where the data came from.", 4),
  "notes" = rep("Yet more useful information", 4)
)

# Create codebook
make_codebook(input_df = test, return_df = FALSE, escape = FALSE,
              extra_vars = more, extra_key = "vars")
Variable Name Data Class Valid Values Statistics Unique Values Missing Values Description Origin Notes
date Date Date rage from 2020-01-01 to 2020-01-02. Min: 2020-01-01
Mode: 2020-01-01
Max: 2020-01-02
Time difference: 1 days
3 1 (25%) NA NA NA
org Character Unique strings (n=4): csg, wdoc, ccjbh, and more. 4 unique strings, top three:
asu (n=1)
ccjbh (n=1)
csg (n=1)
4 0 (0%) Interesting details about my variable. Detailed notes on where the data came from. Yet more useful information
person Character Unique strings (n=4): chris, maeve, joseph, and more. 4 unique strings, top three:
brooks (n=1)
chris (n=1)
joseph (n=1)
4 0 (0%) Interesting details about my variable. Detailed notes on where the data came from. Yet more useful information
role Factor Categorical variable with 2 levels: mentor, rca 2 Unique factors: mentor, rca 2 0 (0%) Interesting details about my variable. Detailed notes on where the data came from. Yet more useful information
years_in_org Numeric Numeric range from 0.2 to 1. Min: 0.2
Avg: 0.5
Median: 0.3
Max: 1
SD: 0.44
4 1 (25%) Interesting details about my variable. Detailed notes on where the data came from. Yet more useful information

πŸ“Š Data

Some key data sets, for learning or general use have been included in the library. Access them using the :: accessor:

str(rcahelpr::hpsa_primarycare)
## 'data.frame':    230 obs. of  17 variables:
##  $ HPSA_Discipline_Class           : chr  "Primary Care" "Primary Care" "Primary Care" "Primary Care" ...
##  $ HPSA_Name                       : chr  "Low Income - MSSA 78.2ddd/Bell SW/Cudahy/Maywood/V" "MSSA 6/Pioneer" "MSSA 78.2uuu/Athens" "MSSA 137/Isleton" ...
##  $ HPSA_ID                         : chr  "1061017434" "1061018308" "1061038158" "1061081242" ...
##  $ County_Equivalent_Name          : chr  "Los Angeles" "Amador" "Los Angeles" "Sacramento" ...
##  $ Designation_Type                : chr  "HPSA Population" "Geographic HPSA" "High Needs Geographic HPSA" "Geographic HPSA" ...
##  $ HPSA_Population_Type            : chr  "Low Income Population HPSA" "Geographic Population" "Geographic Population" "Geographic Population" ...
##  $ HPSA_Score                      : int  13 16 18 9 12 15 10 9 19 11 ...
##  $ PC_MCTA_Score                   : int  NA NA NA NA NA 18 NA 13 NA NA ...
##  $ HPSA_Provider_Ratio_Goal        : chr  "3000:1" "3500:1" "3000:1" "3500:1" ...
##  $ HPSA_FTE                        : num  0.16 0.1 3.75 0.95 9.26 3.2 1.75 27.5 4 0 ...
##  $ HPSA_Designation_Population     : int  53040 5848 84994 5597 39476 17795 13687 101329 54088 7045 ...
##  $ HPSA_Formal_Ratio               : chr  "331500:1" "58480:1" "22665:1" "5892:1" ...
##  $ HPSA_Shortage                   : num  17.52 1.57 24.58 0.65 3.9 ...
##  $ HPSA_Status                     : chr  "Proposed For Withdrawal" "Proposed For Withdrawal" "Proposed For Withdrawal" "Proposed For Withdrawal" ...
##  $ HPSA_Designation_Date           : chr  "9/12/2011" "7/11/2008" "10/9/2012" "5/13/2008" ...
##  $ HPSA_Designation_Last_Update_Dat: chr  "9/10/2021" "9/10/2021" "5/20/2022" "9/10/2021" ...
##  $ Data_Warehouse_Record_Create_Dat: chr  "1/17/2023" "1/17/2023" "1/17/2023" "1/17/2023" ...

This means that we can use these data for a range of purposes, such as pairing them with other libraries and analyzing it:

# Load the graphing library ggplot2 and data management library dplyr
library(dplyr)
library(forcats)
library(ggplot2)

# Wrangle some data to identify the average Health Professional Shortage Area 
# (HPSA) score in a given county:
demo <- rcahelpr::hpsa_primarycare %>%
  group_by(County_Equivalent_Name) %>%
  summarize(mean_hpsa_score = mean(HPSA_Score),
            total_hpsa_population = sum(HPSA_Designation_Population))

# Make a beautiful graph
ggplot(data = demo) +
  geom_point(aes(x = total_hpsa_population , y = fct_rev(County_Equivalent_Name),
                 color = mean_hpsa_score)) +
  geom_segment(aes(x = 0 , xend = total_hpsa_population ,
                   y = County_Equivalent_Name, yend = County_Equivalent_Name,
                   color = mean_hpsa_score)) +
  theme_minimal() +
  labs(title = "Affected Populations in HPSA by County",
       subtitle = "HPSA scores determine priorities for the assignment of clinitians",
       caption = "Data from CCHS") +
  xlab("Total Population in all County HPSAs") +
  ylab("") +
  scale_color_gradient2(low="#F5F5DC", mid = "#FFA500", high="#8B0000", 
                        midpoint = mean(demo$mean_hpsa_score),
                        name = "Average County HPSA Score") +
  theme(legend.position = "bottom")

Also, the data and functions in this package can be combined:

make_codebook(input_df = rcahelpr::hpsa_primarycare, return_df = FALSE, 
              escape = FALSE)
Variable Name Data Class Valid Values Statistics Unique Values Missing Values
HPSA_Discipline_Class Character Unique strings: Primary Care. 1 Unique strings: Primary Care 1 0 (0%)
HPSA_Name Character Unique strings (n=230): Low Income - MSSA 78.2ddd/Bell SW/Cudahy/Maywood/V, MSSA 6/Pioneer, MSSA 78.2uuu/Athens, and more. 230 unique strings, top three:
Colusa County (n=1)
LI-MFW-MSSA 176b/East Palo Alto (n=1)
LI-MFW/MSSA 186 Anderson (n=1)
230 0 (0%)
HPSA_ID Character Unique strings (n=230): 1061017434, 1061018308, 1061038158, and more. 230 unique strings, top three:
1061017434 (n=1)
1061018308 (n=1)
1061038158 (n=1)
230 0 (0%)
County_Equivalent_Name Character Unique strings (n=52): Los Angeles, Amador, Sacramento, and more. 52 unique strings, top three:
Los Angeles (n=42)
San Bernardino (n=15)
Kern (n=14)
52 0 (0%)
Designation_Type Character Unique strings: HPSA Population, Geographic HPSA, High Needs Geographic HPSA. 3 Unique strings: HPSA Population, Geographic HPSA, High Needs Geographic HPSA 3 0 (0%)
HPSA_Population_Type Character Unique strings (n=6): Low Income Population HPSA, Geographic Population, Low Income Migrant Farmworker Population HPSA, and more. 6 unique strings, top three:
Geographic Population (n=125)
Low Income Population HPSA (n=46)
Medicaid Eligible Population HPSA (n=31)
6 0 (0%)
HPSA_Score Integer Numeric range from 4 to 20. Min: 4
Avg: 13.03
Median: 13
Max: 20
SD: 3.32
17 0 (0%)
PC_MCTA_Score Integer Numeric range from 1 to 22. Min: 1
Avg: 13.47
Median: 14
Max: 22
SD: 4.7
22 97 (42%)
HPSA_Provider_Ratio_Goal Character Unique strings: 3000:1, 3500:1. 2 Unique strings: 3000:1, 3500:1 2 0 (0%)
HPSA_FTE Numeric Numeric range from 0 to 43.14. Min: 0
Avg: 5.71
Median: 2.04
Max: 43.14
SD: 7.93
162 0 (0%)
HPSA_Designation_Population Integer Numeric range from 748 to 173639. Min: 748
Avg: 35656.43
Median: 25296
Max: 173639
SD: 33617.59
230 0 (0%)
HPSA_Formal_Ratio Character Unique strings (n=173): 331500:1, 58480:1, 22665:1, and more. 173 unique strings, top three:
(n=52)
3553:1 (n=2)
3556:1 (n=2)
173 0 (0%)
HPSA_Shortage Numeric Numeric range from 0.01 to 30.15. Min: 0.01
Avg: 5.86
Median: 3.18
Max: 30.15
SD: 6.6
216 0 (0%)
HPSA_Status Character Unique strings: Proposed For Withdrawal, Designated. 2 Unique strings: Proposed For Withdrawal, Designated 2 0 (0%)
HPSA_Designation_Date Character Unique strings (n=159): 9/12/2011, 7/11/2008, 10/9/2012, and more. 159 unique strings, top three:
6/22/2022 (n=9)
1/31/2022 (n=6)
3/14/2022 (n=6)
159 0 (0%)
HPSA_Designation_Last_Update_Dat Character Unique strings (n=50): 9/10/2021, 5/20/2022, 8/27/2021, and more. 50 unique strings, top three:
9/10/2021 (n=118)
3/30/2022 (n=8)
6/22/2022 (n=8)
50 0 (0%)
Data_Warehouse_Record_Create_Dat Character Unique strings: 1/17/2023. 1 Unique strings: 1/17/2023 1 0 (0%)

About

A tool to support the Resident Corrections Analyst program through bespoke functions written in R and data sets for learning purposes.

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors