Skip to content

Refactor simd locality#1334

Draft
LucyEmma22 wants to merge 10 commits into
developmentfrom
refactor-simd-locality
Draft

Refactor simd locality#1334
LucyEmma22 wants to merge 10 commits into
developmentfrom
refactor-simd-locality

Conversation

@LucyEmma22

@LucyEmma22 LucyEmma22 commented Jun 2, 2026

Copy link
Copy Markdown
Collaborator

NOTES:

get_locality_data(), get_simd_data() and get_datazone_pop_data() are new functions based on get_spd_data() (see BYOC_ID: 45966 Refactor sc demographics #1313)

process_lookup_postcode() uses get_spd_data() which is refactored in another branch (refactor-sc-demog).

fill_geographies() uses get_slf_gpprac_path() which is refactored in another branch (refactor_gpprac_lookup).

TO-DO:

  • Resolve comments
  • Merge with branches containing refactored get_spd_data() and get_slf_gpprac_path()
  • Check SDL table names
  • Run with data

),
BYOC_MODE) {
# TODO: Check arguments - do get_datazone_pop_data and get_locality_data just need BYOC_MODE?

@LucyEmma22 LucyEmma22 Jun 2, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do get_datazone_pop_data and get_locality_data need all arguments to be written or is BYOC_MODE sufficient? I.e.

add_keep_population_flag <- function(individual_file,
                                     year,
                                     pop_estimates = get_datazone_pop_data(BYOC_MODE = BYOC_MODE),
                                     locality_data = get_locality_data(BYOC_MODE = BYOC_MODE),
                                     BYOC_MODE) {

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

process_lookup_sc_demographics() in #1313 just uses the BYOC_MODE argument for spd_data, uk_postcode_data and ch_name_lookup.


cli::cli_alert_info("Add keep population function finished at {Sys.time()}")
cli::cli_alert_info("Add keep population function finished at {Sys.time()}") # TODO: Is this being kept or changed with a logger_utils function?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is log message being kept or changed with a logger_utils function?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy for us to change it to a logger message. When I made the change of our logs to logger, I changed some of the cli:: alerts to logger since some of them sometimes don't even print in the console so I'm happy for us to change it to logger::log_info instead.

Comment thread R/fill_geographies.R

cli::cli_alert_info("Fill geographies function finished at {Sys.time()}")
cli::cli_alert_info("Fill geographies function finished at {Sys.time()}") # TODO: Is this being kept or changed with a logger_utils function?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy for us to change it to a logger message. When I made the change of our logs to logger, I changed some of the cli:: alerts to logger since some of them sometimes don't even print in the console so I'm happy for us to change it to logger::log_info instead.

Comment thread R/get_lookup_paths.R
BYOC_MODE) {
if (isTRUE(BYOC_MODE)) {
log_slf_event(stage = "read", status = "start", type = "HSCP Localities Lookup", year = "all") # TODO: Check whether to add hscp_locality to log_slf_event mapping list

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add add hscp_locality / simd / datazone_pop to log_slf_event mapping list?

"hscp_locality" ~ "HSCP Localities Lookup",
"simd" ~ "SIMD Lookup",
"datazone_pop" ~ "DataZone Population Lookup"

Comment thread R/get_lookup_paths.R
dplyr::select(
locality = "hscp_locality",
tidyselect::matches("datazone\\d{4}$")
) %>% # TODO: Check whether we need to select columns

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to select columns when data is read in (either from Denodo or disk)?Columns could be selected when the data is read in or when it is used.

NOTE: For get_simd_data() the column names within the file will change when a new version of the SIMD is released.

@LucyEmma22 LucyEmma22 Jun 2, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment also applies to get_simd_data() and get_datazone_pop_data().

Comment thread R/get_lookup_paths.R

log_slf_event(stage = "read", status = "complete", type = "HSCP Localities Lookup", year = "all") # TODO: Check whether to add hscp_locality to log_slf_event mapping list
} else { # TODO: Check logic - are we reading the local file when BYOC_MODE = FALSE or are we still reading from Denodo?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we reading the local file when BYOC_MODE = FALSE or are we still reading from Denodo? read_extract_XXX reads from denodo when BYOC_MODE = FALSE but get_uk_postcode_data and get_spd_data read from disk when BYOC_MODE = FALSE.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment also applies to get_simd_data() and get_datazone_pop_data().

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread R/get_lookup_paths.R
#' @family lookup files
get_locality_data <- function(denodo_connect = get_denodo_connection(BYOC_MODE = BYOC_MODE),
file_path = get_locality_path(),
BYOC_MODE) {

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need file_path = get_locality_path() and denodo_connect = get_denodo_connection(BYOC_MODE = BYOC_MODE) as arguments? denodo_connect is only used if BYOC_MODE = TRUE and file_path is only used if BYOC_MODE = FALSE so we could just have get_locality_data <- function(BYOC_MODE) then define file_path and denodo_connect within the ifelse statement? They always take the same value so is there a benefit to having them as arguments? There may be other reasons to keep them that I have not thought about.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment also applies to get_simd_data() and get_datazone_pop_data().

Comment thread R/get_slf_lookup_paths.R
...
)
get_slf_postcode_path <- function(update = latest_update(), BYOC_MODE, ...) { # TODO: Check whether to keep the update argument

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the update = latest_update() argument be kept? I seem to remember it being removed from somewhere else.

write_to_disk = TRUE) {
# TODO: Check arguments - do get_spd_data, simd_data and get_locality_data just need BYOC_MODE?
# Alternatively we could have no default and just call data in targets (i.e. same as process_extract_XXX).

@LucyEmma22 LucyEmma22 Jun 2, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like add_keep_population_flag, do get_spd_data, get_simd_data and get_locality_data need all arguments to be written or is BYOC_MODE sufficient? I.e.

process_lookup_postcode <- function(spd_data = get_spd_data(BYOC_MODE = BYOC_MODE),
                                    simd_data = get_simd_data(BYOC_MODE = BYOC_MODE),
                                    locality_data = get_locality_data(BYOC_MODE = BYOC_MODE),
                                    BYOC_MODE = FALSE,
                                    run_id = NA,
                                    run_date_time = NA,
                                    write_to_disk = TRUE) {

@LucyEmma22 LucyEmma22 Jun 2, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively we could have no default and just call data in targets (i.e. same as process_extract_XXX). I.e.

process_lookup_postcode <- function(spd_data,
                                    simd_data,
                                    locality_data,,
                                    BYOC_MODE = FALSE,
                                    run_id = NA,
                                    run_date_time = NA,
                                    write_to_disk = TRUE) {

dplyr::mutate(
run_id = run_id,
run_date_time = run_date_time
)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_spd_data() needs to be added as a target.

Comment thread _targets.R

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_spd_data() needs to be added as a target.

Comment thread R/fill_geographies.R
),
slf_gpprac_lookup = read_file(
get_slf_gpprac_path(),
get_slf_gpprac_path(BYOC_MODE = BYOC_MODE),

@OluwatobiOni OluwatobiOni Jun 5, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Lucy! No need for me to do this on my branch then.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge order: spd -> simd_locality > gpprac_lookup

Comment thread R/fill_geographies.R
)
)
fill_postcode_geogs(slf_pc_lookup) %>%
fill_gpprac_geographies(slf_gpprac_lookup)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Lucy! No need for me to do this on my branch then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants