-
Notifications
You must be signed in to change notification settings - Fork 95
Latest daily #830
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Latest daily #830
Conversation
Field measurements
Merge branch 'main' of github.com:DOI-USGS/dataRetrieval # Conflicts: # tutorials/basic_slides_deck.qmd # tutorials/changes_slides_deck.qmd
Fixing docs
Update docs
Update dependency requirement: `curl` >=7.0.0
|
I can take a look this afternoon into tomorrow. |
| bbox = NA, | ||
| limit = NA, | ||
| max_results = NA, | ||
| convertType = TRUE){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason for the order of the input parameters in the documentation and the function itself? The orders are different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The order in the documentation doesn't matter. The order in the function matters in that a user doesn't necessarily need to name each argument - but if they don't then they need to input them in the correct order. I'll double check the order is similar to the other functions, because what we don't want to do is change it later.
| return_list <- data.frame(matrix(nrow = 0, ncol = length(properties))) | ||
| return_list <- data.frame(matrix(nrow = 0, | ||
| ncol = length(properties))) | ||
| return_list <- lapply(return_list, as.character) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed? Ensures that each empty column's format is character?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, otherwise it comes back as logical.
| # no service specified: | ||
| availableData <- read_waterdata_ts_meta(monitoring_location_id = "USGS-05114000") | ||
| expect_equal(ncol(availableData), 18) | ||
| expect_gte(ncol(availableData), 18) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well that's snazzy.
tutorials/basic_slides_deck.qmd
Outdated
| ``` | ||
|
|
||
| That's a LOT of columns that come back. We won't look at them here, but let's jump over to RStudio to look through the results. | ||
| That's a LOT of columns that come back. We won't look at them here, but you can use `View` in RStudio to explore on your own. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| That's a LOT of columns that come back. We won't look at them here, but you can use `View` in RStudio to explore on your own. | |
| That's a LOT of columns that came back. We won't look at them here, but you can use `View` in RStudio to explore on your own. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm now re-reading I'm not sure which would be better. Could say "That's a LOT of columns returned."
| pal <- colorNumeric("viridis", latest_dane_county_daily$value) | ||
| leaflet(data = latest_dane_county_daily |> | ||
| sf::st_transform(crs = leaflet_crs)) |> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest individually defining leaflet_crs in case someone is skipping to a particular section and doesn't know where it is originally defined.
ehinman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First pass looks good, just want to test out the new function briefly tomorrow.
| #' @param monitoring_location_id `r get_params("latest-daily")$monitoring_location_id` | ||
| #' @param parameter_code `r get_params("latest-daily")$parameter_code` | ||
| #' @param statistic_id `r get_params("latest-daily")$statistic_id` | ||
| #' @param time `r get_params("latest-daily")$time` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
time seems like a weird input here. I tried the following:
test = read_waterdata_latest_daily(monitoring_location_id = "USGS-12456500", time = "2018-02-12T23:20:50Z")
And it returns zero results. Is there ever a time this input would be helpful? Perhaps I used it incorrectly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if you did an exact time, probably not. But you could maybe have a scenario where you have a list of sites or bounding box and say "give me the latest results for for the last 30 days", and that way you'd only get sites back that have new data (instead of maybe the latest data point was 30 years ago or something).
| #' @param time `r get_params("latest-daily")$time` | ||
| #' @param value `r get_params("latest-daily")$value` | ||
| #' @param unit_of_measure `r get_params("latest-daily")$unit_of_measure` | ||
| #' @param approval_status `r get_params("latest-daily")$approval_status` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest adding somewhere (whether in function documentation or in examples) that the "latest" will often be "Provisional" unless the gage is currently non-functional and the latest is from a while ago.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer we petition the WaterData API folks to do that so our documentations don't start to deviate.
| #' | ||
| #' \donttest{ | ||
| #' site <- "USGS-02238500" | ||
| #' pcode <- "00060" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You set the pcode here but do not use it below in the examples.
| #' @param skipGeometry This option can be used to skip response geometries for | ||
| #' each feature. The returning object will be a data frame with no spatial | ||
| #' information. | ||
| #' @param convertType logical, defaults to `TRUE`. If `TRUE`, the function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The last_modified column is POSIXct regardless of whether convertType is TRUE or FALSE.
ehinman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, Laura. Thanks for the opportunity to review. I went through the diffs, tested the latest daily function, and took a look at the built documentation page for the new function.
Left some minor comments peppered throughout. My only additional thought is about the time input. Your response makes total sense for filtering to sites that have a latest daily measurement within a certain time range. One thing I noticed when looking at this example:
test = read_waterdata_latest_daily(bbox = c(-112.249246,40.516572,-111.293435,41.247515), parameter_code = '00060', convertType = TRUE, time = "2025-11-18T00:00:00Z/..")
As expected, it returns all latest daily measurements taken yesterday, but the last_modified column is from ~6:30AM UTC (midnight-ish central time, I think?) on 11-19-2025. Presumably, this is when the "daily" calculations are made for this group of gages. I'm wondering if somewhere we should alert people that (depending on their time zone) they shouldn't expect the latest daily from the previous day to be available at midnight, but maybe one or two hours after midnight: for automated pulls and the like.
jzemmels
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updates look good. Tested out the new function a few ways and didn't run into issues, other than a minor comment. Approved.
Also tested out building the documentation/pkgdown sites. Some findings, which you're free to ignore:
pkgdown.ymlneeds to haveread_waterdta_latest_dailyadded to the reference section.- These lines threw an error when building vignettes. The fix was to remove the
list()wrapping around function arguments. - Suggest adding
zooandmapspackages to DESCRIPTION Suggest section, as they're required for running the vignettes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to remember: was there a reason why we didn't want to add parameter-codes as a possible endpoint for this function? I tested out a few CQL statements and it seemed to work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The zoo and maps packages are only in vignettes that we added to .Rbuildignore. We want the CRAN version of dataRetrieval to be nice and streamlined with as few imports and suggested packages as possible. So that is why they are not on the Suggest list.
And whoo-boy...that movingAverage vignette is old-school! If you can't tell from the graphs it is the precursor to HASP. I feel like we should delete all the text and say if you are interested in doing those calculations, see:
https://doi-usgs.github.io/HASP/
(which I am very very close to pushing a big update that will use all the waterdata functions).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The read_waterdata function was first developed before the parameter-code was an endpoint. I'll update it - it also needs field measurements and now latest-daily.
| time = NA_character_, | ||
| bbox = NA, | ||
| limit = NA, | ||
| max_results = NA, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps I'm still misunderstanding the usage of these two arguments, but max_results doesn't seem to change the size of the output. Examples:
read_waterdata_latest_daily(monitoring_location_id = "USGS-01491000")
read_waterdata_latest_daily(monitoring_location_id = "USGS-01491000", max_results = 10)
read_waterdata_latest_daily(monitoring_location_id = "USGS-01491000", limit = 10)
read_waterdata_latest_daily(monitoring_location_id = "USGS-01491000", max_results = 5, limit = 10)
The same happens for other read_waterdata_ functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed! args got moved around and max_results became meaningless. Should be good to go now
| return_list$value <- as.numeric() | ||
| } | ||
|
|
||
| if(convertType && "contributing_drainage_area" %in% names(return_list)){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just ran into a new, rather specific error.
Reprex:
dplyr::bind_rows(
read_waterdata_monitoring_location(monitoring_location_id = "USGS-092403167282001"),
read_waterdata_monitoring_location(monitoring_location_id = "USGS-01435000")
)
class(read_waterdata_monitoring_location(monitoring_location_id = "USGS-092403167282001")$drainage_area)
class(read_waterdata_monitoring_location(monitoring_location_id = "USGS-01435000")$drainage_area)Only dplyr::bind_rows complains, rbind just casts the numeric drainage area as a character.
I may be wrong, but I think adding an if statement for "drainage_area" here should fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't realize there was contributing_drainage_area and drainage_area. We have the contributing_drainage_area in there, just need to add drainage_area to the list in cleanup_cols
| convertType = TRUE){ | ||
|
|
||
| service <- "latest-daily" | ||
| output_id <- "latest_daily_id" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Random thought: should this just be "daily_id"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup! nope
I'm going to keep this in mind for the continuous endpoint (coming next!), where we have to deal a lot more with timezones. |
No description provided.