BYOC_ID: 46056 Refactor District Nursing#1327
Conversation
…e-linkage-files into refactor-dn
| names_to = "year", | ||
| names_pattern = "(\\d{4})_cost", | ||
| values_to = "cost" | ||
| ) |
There was a problem hiding this comment.
This should be removed if we are using pivoted costs.
There was a problem hiding this comment.
Agreed! costs are now pivoted so we can remove this code. Will need testing again once the data is available
| # dn_raw_costs_path = get_dn_raw_costs_path(), # TODO: Check if needed. If it is function will need to be refactored to include the BYOC_MODE argument. | ||
| # dn_raw_contacts_path = fs::path(get_slf_dir(), "Costs", "DN-Contacts-Numbers-for-Costs.csv"), # TODO: Check if needed. If it is function will need to be refactored to include the BYOC_MODE argument. | ||
| # pop_path = get_pop_path(type = "hscp"), # TODO: Check if needed. If it is function will need to be refactored to include the BYOC_MODE argument. |
There was a problem hiding this comment.
Yes - these will be required and include the BYOC_MODE argument. We agreed that going forward any dependencies should be displayed in the function arguments to make them more visible
| BYOC_MODE = FALSE, | ||
| run_id = NA, | ||
| run_date_time = NA) { | ||
| log_slf_event(stage = "process", status = "start", type = "dn_costs", year = year) # TODO: Check this is necessary. |
There was a problem hiding this comment.
Good spot! - yes i think we need this!
| BYOC_MODE = BYOC_MODE | ||
| ) | ||
|
|
||
| log_slf_event(stage = "process", status = "complete", type = "dn_costs", year = "all") # TODO: Check this is necessary. |
There was a problem hiding this comment.
Yes - good spot! I added a comment to CH costs as a reminder to do this also!
| denodo_connect, | ||
| dbplyr::in_schema("sdl", "sdl_district_nursing_source") | ||
| ) %>% # TODO: Check table name. | ||
| # TODO: Check whether to filter by year. |
There was a problem hiding this comment.
Yes - i think filter by year is missing here
There was a problem hiding this comment.
Do you know what the year column is/will be?
There was a problem hiding this comment.
Oh this one is interesting actually - This works similar to SPARRA how the extracts are saved in hscdiip separately, the main difference is that DN is an archived dataset but this will also be a file ingest by NSS. Currently, each extract is split into FY with the name, however once this is in Denodo, i think NSS will create a FY column for us and the data would be stacked as one view.
For now, i think you could call the column year and add placeholder code with a TODO so that we remember to check this when we get the data to test
| process_costs_dn( | ||
| denodo_connect = get_denodo_connection(BYOC_MODE = BYOC_MODE), | ||
| BYOC_MODE = FALSE, | ||
| run_id = NA, | ||
| run_date_time = NA | ||
| ), |
There was a problem hiding this comment.
Ive answered the TODO comments above, they would also need added to the targets as parameters here too
There was a problem hiding this comment.
Yes good point!
| process_costs_dn( | ||
| denodo_connect = get_denodo_connection(BYOC_MODE = BYOC_MODE), | ||
| BYOC_MODE = FALSE, | ||
| run_id = NA, | ||
| run_date_time = NA |
There was a problem hiding this comment.
See above: also need to add extra parameters here for addtitional files
Jennit07
left a comment
There was a problem hiding this comment.
Hi @LucyEmma22 Thanks for doing this PR - great work. Everything looks good, ive added some comments answering some of your code questions. A few changes required here. But overall looks good and i can test again when the data is ready! Thanks
| if (isTRUE(BYOC_MODE)) { | ||
| dn_costs_path <- file.path( | ||
| denodo_output_path(), | ||
| stringr::str_glue("Cost_DN_Lookup.parquet") |
There was a problem hiding this comment.
we are now using cost_ch_lookup.parquet, changing everything to small letters for consistency. NSS's copy_to_s3 will take cost_dn_lookup.parquet as it has been effected on their end as small letters.
| dn_costs_path <- get_file_path( | ||
| directory = fs::path(get_slf_dir(), "Costs"), | ||
| file_name = stringr::str_glue( | ||
| "Cost_DN_Lookup.parquet" |
There was a problem hiding this comment.
same comment as above
| BYOC_MODE = FALSE, | ||
| run_id = NA, | ||
| run_date_time = NA) { | ||
| log_slf_event(stage = "process", status = "start", type = "dn_costs", year = year) # TODO: Check this is necessary. |
There was a problem hiding this comment.
You are correct, the line of code needs to be added. type should be dn_cost_lookup for consistency.
| BYOC_MODE = BYOC_MODE | ||
| ) | ||
|
|
||
| log_slf_event(stage = "process", status = "complete", type = "dn_costs", year = "all") # TODO: Check this is necessary. |
There was a problem hiding this comment.
Type should be dn_cost_lookup for consistency.
Data Required:
extract_district_nursing-->get_boxi_extract_path(year = year, type = "dn")dn_costs-->get_dn_raw_costs_path()dn_contacts-->fs::path(get_slf_dir(), "Costs", "DN-Contacts-Numbers-for-Costs.csv")population_lookup-->get_pop_path(type = "hscp")To Do/Check:
read_extract_district_nursing:extract_district_nursing.Patient Data Zone 2011 (Contact).get_dn_costs_path:get_ch_costs_path.process_costs_dn:dn_coststolog_slf_event.dn_costs,dn_contactsandpopulation_lookup.dn_costs,dn_contactsandpopulation_lookup.log_slf_event:dn_costto type list.