Refactor simd locality#1334
Conversation
…functions to get_lookup_paths.R
… with get_locality_data in add_keep_population_flag
…th in fill_geographies arguments
…process_lookup_postcode
| ), | ||
| BYOC_MODE) { | ||
| # TODO: Check arguments - do get_datazone_pop_data and get_locality_data just need BYOC_MODE? | ||
|
|
There was a problem hiding this comment.
Do get_datazone_pop_data and get_locality_data need all arguments to be written or is BYOC_MODE sufficient? I.e.
add_keep_population_flag <- function(individual_file,
year,
pop_estimates = get_datazone_pop_data(BYOC_MODE = BYOC_MODE),
locality_data = get_locality_data(BYOC_MODE = BYOC_MODE),
BYOC_MODE) {
There was a problem hiding this comment.
process_lookup_sc_demographics() in #1313 just uses the BYOC_MODE argument for spd_data, uk_postcode_data and ch_name_lookup.
|
|
||
| cli::cli_alert_info("Add keep population function finished at {Sys.time()}") | ||
| cli::cli_alert_info("Add keep population function finished at {Sys.time()}") # TODO: Is this being kept or changed with a logger_utils function? | ||
|
|
There was a problem hiding this comment.
Is log message being kept or changed with a logger_utils function?
There was a problem hiding this comment.
Happy for us to change it to a logger message. When I made the change of our logs to logger, I changed some of the cli:: alerts to logger since some of them sometimes don't even print in the console so I'm happy for us to change it to logger::log_info instead.
|
|
||
| cli::cli_alert_info("Fill geographies function finished at {Sys.time()}") | ||
| cli::cli_alert_info("Fill geographies function finished at {Sys.time()}") # TODO: Is this being kept or changed with a logger_utils function? | ||
|
|
There was a problem hiding this comment.
Same comment as above.
There was a problem hiding this comment.
Happy for us to change it to a logger message. When I made the change of our logs to logger, I changed some of the cli:: alerts to logger since some of them sometimes don't even print in the console so I'm happy for us to change it to logger::log_info instead.
| BYOC_MODE) { | ||
| if (isTRUE(BYOC_MODE)) { | ||
| log_slf_event(stage = "read", status = "start", type = "HSCP Localities Lookup", year = "all") # TODO: Check whether to add hscp_locality to log_slf_event mapping list | ||
|
|
There was a problem hiding this comment.
Should we add add hscp_locality / simd / datazone_pop to log_slf_event mapping list?
"hscp_locality" ~ "HSCP Localities Lookup",
"simd" ~ "SIMD Lookup",
"datazone_pop" ~ "DataZone Population Lookup"
| dplyr::select( | ||
| locality = "hscp_locality", | ||
| tidyselect::matches("datazone\\d{4}$") | ||
| ) %>% # TODO: Check whether we need to select columns |
There was a problem hiding this comment.
Do we need to select columns when data is read in (either from Denodo or disk)?Columns could be selected when the data is read in or when it is used.
NOTE: For get_simd_data() the column names within the file will change when a new version of the SIMD is released.
There was a problem hiding this comment.
Comment also applies to get_simd_data() and get_datazone_pop_data().
|
|
||
| log_slf_event(stage = "read", status = "complete", type = "HSCP Localities Lookup", year = "all") # TODO: Check whether to add hscp_locality to log_slf_event mapping list | ||
| } else { # TODO: Check logic - are we reading the local file when BYOC_MODE = FALSE or are we still reading from Denodo? | ||
|
|
There was a problem hiding this comment.
Are we reading the local file when BYOC_MODE = FALSE or are we still reading from Denodo? read_extract_XXX reads from denodo when BYOC_MODE = FALSE but get_uk_postcode_data and get_spd_data read from disk when BYOC_MODE = FALSE.
There was a problem hiding this comment.
Comment also applies to get_simd_data() and get_datazone_pop_data().
| #' @family lookup files | ||
| get_locality_data <- function(denodo_connect = get_denodo_connection(BYOC_MODE = BYOC_MODE), | ||
| file_path = get_locality_path(), | ||
| BYOC_MODE) { |
There was a problem hiding this comment.
Do we need file_path = get_locality_path() and denodo_connect = get_denodo_connection(BYOC_MODE = BYOC_MODE) as arguments? denodo_connect is only used if BYOC_MODE = TRUE and file_path is only used if BYOC_MODE = FALSE so we could just have get_locality_data <- function(BYOC_MODE) then define file_path and denodo_connect within the ifelse statement? They always take the same value so is there a benefit to having them as arguments? There may be other reasons to keep them that I have not thought about.
There was a problem hiding this comment.
Comment also applies to get_simd_data() and get_datazone_pop_data().
| ... | ||
| ) | ||
| get_slf_postcode_path <- function(update = latest_update(), BYOC_MODE, ...) { # TODO: Check whether to keep the update argument | ||
|
|
There was a problem hiding this comment.
Should the update = latest_update() argument be kept? I seem to remember it being removed from somewhere else.
| write_to_disk = TRUE) { | ||
| # TODO: Check arguments - do get_spd_data, simd_data and get_locality_data just need BYOC_MODE? | ||
| # Alternatively we could have no default and just call data in targets (i.e. same as process_extract_XXX). | ||
|
|
There was a problem hiding this comment.
Like add_keep_population_flag, do get_spd_data, get_simd_data and get_locality_data need all arguments to be written or is BYOC_MODE sufficient? I.e.
process_lookup_postcode <- function(spd_data = get_spd_data(BYOC_MODE = BYOC_MODE),
simd_data = get_simd_data(BYOC_MODE = BYOC_MODE),
locality_data = get_locality_data(BYOC_MODE = BYOC_MODE),
BYOC_MODE = FALSE,
run_id = NA,
run_date_time = NA,
write_to_disk = TRUE) {
There was a problem hiding this comment.
Alternatively we could have no default and just call data in targets (i.e. same as process_extract_XXX). I.e.
process_lookup_postcode <- function(spd_data,
simd_data,
locality_data,,
BYOC_MODE = FALSE,
run_id = NA,
run_date_time = NA,
write_to_disk = TRUE) {
| dplyr::mutate( | ||
| run_id = run_id, | ||
| run_date_time = run_date_time | ||
| ) |
There was a problem hiding this comment.
get_spd_data() needs to be added as a target.
There was a problem hiding this comment.
get_spd_data() needs to be added as a target.
| ), | ||
| slf_gpprac_lookup = read_file( | ||
| get_slf_gpprac_path(), | ||
| get_slf_gpprac_path(BYOC_MODE = BYOC_MODE), |
There was a problem hiding this comment.
Thanks Lucy! No need for me to do this on my branch then.
There was a problem hiding this comment.
Merge order: spd -> simd_locality > gpprac_lookup
| ) | ||
| ) | ||
| fill_postcode_geogs(slf_pc_lookup) %>% | ||
| fill_gpprac_geographies(slf_gpprac_lookup) |
There was a problem hiding this comment.
Thanks Lucy! No need for me to do this on my branch then.
NOTES:
get_locality_data(),get_simd_data()andget_datazone_pop_data()are new functions based on get_spd_data() (see BYOC_ID: 45966 Refactor sc demographics #1313)process_lookup_postcode()usesget_spd_data()which is refactored in another branch (refactor-sc-demog).fill_geographies()usesget_slf_gpprac_path()which is refactored in another branch (refactor_gpprac_lookup).TO-DO:
get_spd_data()andget_slf_gpprac_path()