Skip to content

Add argument to prevent padding data with NAs #90

@klwilson23

Description

@klwilson23

What I want to do:

I'm trying to download daily weather data for two stations, in this case from the Port Hardy A station name. These two stations don't overlap in ranges. Station 202 goes from 1944 until 2013, while station 51319 picks up from 2013 until today. Basically, I would just like a single time-series of data that accounts for where each station leaves off or picks up.

Issue?

Basically, the download is creating a single data-frame but duplicating the two-time series: one for each station ID. While I am getting the real data from each station (which is what I am asking for), I am also getting missing data for each station outside the range for each station. It appears to duplicate NA's for each date I requested.

I'm not sure whether this behaviour for merging data across stations is intended or not. I could attempt to remove the duplicated dates manually, but I might have to do some quality control on that. Suggestions?

Example:

Here's the stations for Port Hardy. Notice Port Hardy A has two station IDs and two different ranges that don't overlap.

stations_search("Port Hardy",interval="day")

Then I download those two stations:

portHardy_pg <- weather_dl(station_ids = c(202, 51319), start = "1975-01-01", end = "2018-12-31",interval = "day",trim=TRUE,format=TRUE)

And we can start to see the problem as we look at the temperature for station 202 at the start and end of the range

head(portHardy_pg[portHardy_pg$station_id==202,c(1,2,11,22:24)])
# A tibble: 6 x 6
  station_name station_id date       max_temp max_temp_flag mean_temp
  <chr>             <dbl> <date>        <dbl> <chr>             <dbl>
1 PORT HARDY A        202 1975-01-01      3.9 ""                  2  
2 PORT HARDY A        202 1975-01-02      6.1 ""                  3.1
3 PORT HARDY A        202 1975-01-03      3.9 ""                  2  
4 PORT HARDY A        202 1975-01-04      3.9 ""                  2.3
5 PORT HARDY A        202 1975-01-05      5   ""                  3.6
6 PORT HARDY A        202 1975-01-06      2.8 ""                  0.9
tail(portHardy_pg[portHardy_pg$station_id==202,c(1,2,11,22:24)])

Here we see the duplicated NAs for station 202 at the end of the range

# A tibble: 6 x 6
  station_name station_id date       max_temp max_temp_flag mean_temp
  <chr>             <dbl> <date>        <dbl> <chr>             <dbl>
1 PORT HARDY A        202 2018-12-26       NA ""                   NA
2 PORT HARDY A        202 2018-12-27       NA ""                   NA
3 PORT HARDY A        202 2018-12-28       NA ""                   NA
4 PORT HARDY A        202 2018-12-29       NA ""                   NA
5 PORT HARDY A        202 2018-12-30       NA ""                   NA
6 PORT HARDY A        202 2018-12-31       NA ""                   NA

I get similar issues for station 5139 at the start and end of the range:

head(portHardy_pg[portHardy_pg$station_id==51319,c(1,2,11,22:24)]) # here we see the duplicated NAs for station 5139 at the beginning of the range
# A tibble: 6 x 6
  station_name station_id date       max_temp max_temp_flag mean_temp
  <chr>             <dbl> <date>        <dbl> <chr>             <dbl>
1 PORT HARDY A      51319 1975-01-01       NA ""                   NA
2 PORT HARDY A      51319 1975-01-02       NA ""                   NA
3 PORT HARDY A      51319 1975-01-03       NA ""                   NA
4 PORT HARDY A      51319 1975-01-04       NA ""                   NA
5 PORT HARDY A      51319 1975-01-05       NA ""                   NA
6 PORT HARDY A      51319 1975-01-06       NA ""                   NA
tail(portHardy_pg[portHardy_pg$station_id==51319,c(1,2,11,22:24)])
# A tibble: 6 x 6
  station_name station_id date       max_temp max_temp_flag mean_temp
  <chr>             <dbl> <date>        <dbl> <chr>             <dbl>
1 PORT HARDY A      51319 2018-12-26      5.5 ""                  2.8
2 PORT HARDY A      51319 2018-12-27      5.1 ""                  2.3
3 PORT HARDY A      51319 2018-12-28      4.4 ""                  4  
4 PORT HARDY A      51319 2018-12-29     10.8 ""                  7.4
5 PORT HARDY A      51319 2018-12-30      7   ""                  3.2
6 PORT HARDY A      51319 2018-12-31      5.2 ""                  2.1

Interestingly, if I download only one station but specify a "bad range", then the data download trims itself to the observation period.

For example:

new_dl <- weather_dl(station_ids = 202, start = "1975-01-01", end = "2018-12-31",interval = "day",trim=TRUE,format=TRUE)
tail(new_dl[new_dl$station_id==202,c(1,2,11,22:24)])
# A tibble: 6 x 6
  station_name station_id date       max_temp max_temp_flag mean_temp
  <chr>             <dbl> <date>        <dbl> <chr>             <dbl>
1 PORT HARDY A        202 2013-06-07     16.4 ""                 13.1
2 PORT HARDY A        202 2013-06-08     13.1 ""                 11.4
3 PORT HARDY A        202 2013-06-09     13.8 ""                 10.1
4 PORT HARDY A        202 2013-06-10     15.1 ""                 10.5
5 PORT HARDY A        202 2013-06-11     14.8 ""                 12.3
6 PORT HARDY A        202 2013-06-12     15.5 ""                 12.5

My Environment

R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18362)

Matrix products: default

locale:
[1] LC_COLLATE=English_Canada.1252  LC_CTYPE=English_Canada.1252    LC_MONETARY=English_Canada.1252
[4] LC_NUMERIC=C                    LC_TIME=English_Canada.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] weathercan_0.3.1

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.3       rstudioapi_0.10  magrittr_1.5     tidyselect_0.2.5 R6_2.4.1         rlang_0.4.1     
 [7] fansi_0.4.0      stringr_1.4.0    httr_1.4.1       dplyr_0.8.3      tools_3.6.1      packrat_0.5.0   
[13] utf8_1.1.4       cli_1.1.0        ellipsis_0.3.0   assertthat_0.2.1 lifecycle_0.1.0  tibble_2.1.3    
[19] crayon_1.3.4     tidyr_1.0.0      purrr_0.3.3      vctrs_0.2.0      curl_4.2         zeallot_0.1.0   
[25] glue_1.3.1       stringi_1.4.3    compiler_3.6.1   pillar_1.4.2     backports_1.1.5  lubridate_1.7.4 
[31] pkgconfig_2.0.3 

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions