-
Notifications
You must be signed in to change notification settings - Fork 65
Description
Data for example
I could have likely made easier example data sets, but this is what I was using at the time (they are public and open):
62163.csv.gz
62163_data.csv.gz
62163_data_separated.csv.gz
Description of Issue
Fractional seconds are truncated in col_time defaults, but not col_datetime defaults and the ask may be a warning if fractional seconds are truncated if col_time(format = "") is used for hms/time objects.
Reprex
This is the option needed to set to see fractional seconds when printing
options(digits.secs = 3)Load hms so we can see things as times versus difftime objects
library(hms)Example with Fractional Seconds
url = "https://github.com/r-lib/vroom/files/8353807/62163.csv.gz"
file = file.path(tempdir(), basename(url))
if (!file.exists(file)) {
curl::curl_download(url, file)
}Data Header
We see the fractional seconds in this file, which is needed for our analysis.
readLines(file, 5)
#> [1] "DAY_OF_DATA,START_TIME,END_TIME,DATA_QUALITY_FLAG_CODE,DATA_QUALITY_FLAG_VALUE"
#> [2] "2,14:11:35.463000,14:11:35.463000,COUNT_SPIKES_Z,1"
#> [3] "2,14:10:00.000000,14:10:59.988000,ADJACENT_INVALID,1"
#> [4] "2,14:12:00.000000,14:12:59.988000,ADJACENT_INVALID,1"
#> [5] "4,11:17:25.938000,11:17:25.938000,COUNT_SPIKES_Z,1"Reading data with vroom
Here we read the data and see that the output is a col_time object
data = vroom::vroom(file, progress = FALSE)
#> Rows: 15 Columns: 5
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (1): DATA_QUALITY_FLAG_CODE
#> dbl (2): DAY_OF_DATA, DATA_QUALITY_FLAG_VALUE
#> time (2): START_TIME, END_TIME
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
vroom::spec(data)
#> cols(
#> DAY_OF_DATA = col_double(),
#> START_TIME = col_time(format = ""),
#> END_TIME = col_time(format = ""),
#> DATA_QUALITY_FLAG_CODE = col_character(),
#> DATA_QUALITY_FLAG_VALUE = col_double(),
#> .delim = ","
#> )No fractional seconds are printed:
head(data)
#> # A tibble: 6 × 5
#> DAY_OF_DATA START_TIME END_TIME DATA_QUALITY_FLAG_CODE DATA_QUALITY_FLAG_VALUE
#> <dbl> <time> <time> <chr> <dbl>
#> 1 2 14:11:35 14:11:35 COUNT_SPIKES_Z 1
#> 2 2 14:10:00 14:10:59 ADJACENT_INVALID 1
#> 3 2 14:12:00 14:12:59 ADJACENT_INVALID 1
#> 4 4 11:17:25 11:17:25 COUNT_SPIKES_Z 1
#> 5 4 11:35:21 11:35:21 COUNT_SPIKES_X 1
#> 6 4 11:16:00 11:16:59 ADJACENT_INVALID 1We can confirm that they are truncated.
as.numeric(lubridate::seconds(data$START_TIME[1:5])) %% 1
#> [1] 0 0 0 0 0
as.numeric(data$START_TIME[1:5]) %% 1
#> [1] 0 0 0 0 0Different col_time format
The default %AT I don’t think takes into account fractional seconds, so we need to pass our own format in.
Here we specify the col_time so that it uses %OS, which I think is R-specific as per ?strptime
col_time_with_frac_secs = function(...) {
vroom::col_time(format = "%H:%M:%OS", ...)
}read in the data
data = vroom::vroom(file,
col_types =
vroom::cols(
START_TIME = col_time_with_frac_secs(),
END_TIME = col_time_with_frac_secs(),
))Fractional seconds are preserved
as.numeric(lubridate::seconds(data$START_TIME[1:5])) %% 1
#> [1] 0.463 0.000 0.000 0.938 0.800
as.numeric(data$START_TIME[1:5]) %% 1
#> [1] 0.463 0.000 0.000 0.938 0.800Discussion of Issue/Resolution
Overall, that may be the end of it to be an issue to point to people in the future. I’m not sure if there should be warning that things may be truncated or whether this should/could guess fractional seconds. Below I just show that the default for a datetime does preserve fractional seconds, so this behavior is mildly inconsistent, and the format of how you store dates and times can lead to fractional second differences.
Example File with Datetime versus time object
If we have a file with a datetime, then this doesn’t seem to be an issue:
url = "https://github.com/r-lib/vroom/files/8353874/62163_data.csv.gz"
file = file.path(tempdir(), basename(url))
if (!file.exists(file)) {
curl::curl_download(url, file)
}Data Header
We see the fractional seconds in this file.
readLines(file, 5)
#> [1] "HEADER_TIMESTAMP,X,Y,Z"
#> [2] "2000-01-08 17:30:00.000,0.208,0.079,-0.751"
#> [3] "2000-01-08 17:30:00.013,0.17,0.094,-0.751"
#> [4] "2000-01-08 17:30:00.025,0.22,0.109,-0.727"
#> [5] "2000-01-08 17:30:00.038,0.258,0.047,-0.78"Reading data with vroom
Here we read the data and see that the output is a col_datetime object
data = vroom::vroom(file, progress = FALSE)
#> Rows: 9 Columns: 4
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> dbl (3): X, Y, Z
#> dttm (1): HEADER_TIMESTAMP
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
vroom::spec(data)
#> cols(
#> HEADER_TIMESTAMP = col_datetime(format = ""),
#> X = col_double(),
#> Y = col_double(),
#> Z = col_double(),
#> .delim = ","
#> )We see fractional seconds are preserved:
head(data)
#> # A tibble: 6 × 4
#> HEADER_TIMESTAMP X Y Z
#> <dttm> <dbl> <dbl> <dbl>
#> 1 2000-01-08 17:30:00.000 0.208 0.079 -0.751
#> 2 2000-01-08 17:30:00.013 0.17 0.094 -0.751
#> 3 2000-01-08 17:30:00.024 0.22 0.109 -0.727
#> 4 2000-01-08 17:30:00.037 0.258 0.047 -0.78
#> 5 2000-01-08 17:30:00.049 0.276 0.029 -0.762
#> 6 2000-01-08 17:30:00.062 0.258 0.032 -0.777We can confirm that they are preserved
as.numeric(lubridate::seconds(data$HEADER_TIMESTAMP[1:5])) %% 1
#> [1] 0.00000000 0.01300001 0.02499998 0.03799999 0.04999995
as.numeric(data$HEADER_TIMESTAMP[1:5]) %% 1
#> [1] 0.00000000 0.01300001 0.02499998 0.03799999 0.04999995Example with separated Date and Time - similar issue
If we have a file with 2 columns, separated in date and time, where time has fractional seconds, then this is again an issue as the first example.
url = "https://github.com/r-lib/vroom/files/8353880/62163_data_separated.csv.gz"
file = file.path(tempdir(), basename(url))
if (!file.exists(file)) {
curl::curl_download(url, file)
}Data Header
We see the fractional seconds in this file.
readLines(file, 5)
#> [1] "DATE,TIMESTAMP,X,Y,Z"
#> [2] "2000-01-08,17:30:00.000,0.208,0.079,-0.751"
#> [3] "2000-01-08,17:30:00.013,0.17,0.094,-0.751"
#> [4] "2000-01-08,17:30:00.025,0.22,0.109,-0.727"
#> [5] "2000-01-08,17:30:00.038,0.258,0.047,-0.78"Reading data with vroom
Here we read the data and see that the output is a col_time object
data = vroom::vroom(file, progress = FALSE)
#> Rows: 9 Columns: 5
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> dbl (3): X, Y, Z
#> date (1): DATE
#> time (1): TIMESTAMP
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
vroom::spec(data)
#> cols(
#> DATE = col_date(format = ""),
#> TIMESTAMP = col_time(format = ""),
#> X = col_double(),
#> Y = col_double(),
#> Z = col_double(),
#> .delim = ","
#> )No fractional seconds are printed:
head(data)
#> # A tibble: 6 × 5
#> DATE TIMESTAMP X Y Z
#> <date> <time> <dbl> <dbl> <dbl>
#> 1 2000-01-08 17:30 0.208 0.079 -0.751
#> 2 2000-01-08 17:30 0.17 0.094 -0.751
#> 3 2000-01-08 17:30 0.22 0.109 -0.727
#> 4 2000-01-08 17:30 0.258 0.047 -0.78
#> 5 2000-01-08 17:30 0.276 0.029 -0.762
#> 6 2000-01-08 17:30 0.258 0.032 -0.777We can confirm that they are truncated.
as.numeric(lubridate::seconds(data$TIMESTAMP[1:5])) %% 1
#> [1] 0 0 0 0 0
as.numeric(data$TIMESTAMP[1:5]) %% 1
#> [1] 0 0 0 0 0Created on 2022-03-25 by the reprex package (v2.0.1)
Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.1.2 (2021-11-01)
#> os Debian GNU/Linux 10 (buster)
#> system x86_64, linux-gnu
#> ui X11
#> language (EN)
#> collate C.UTF-8
#> ctype C.UTF-8
#> tz Etc/UTC
#> date 2022-03-25
#> pandoc 2.14.0.3 @ /usr/lib/rstudio-server/bin/pandoc/ (via rmarkdown)
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date (UTC) lib source
#> bit 4.0.4 2020-08-04 [2] CRAN (R 4.1.0)
#> bit64 4.0.5 2020-08-30 [2] CRAN (R 4.1.0)
#> cli 3.2.0.9000 2022-03-16 [1] Github (r-lib/cli@51463d2)
#> crayon 1.5.0 2022-02-14 [1] CRAN (R 4.1.2)
#> curl 4.3.2 2021-06-23 [1] CRAN (R 4.1.0)
#> digest 0.6.29 2021-12-01 [1] CRAN (R 4.1.2)
#> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.1.2)
#> evaluate 0.15 2022-02-18 [1] CRAN (R 4.1.2)
#> fansi 1.0.2 2022-01-14 [1] CRAN (R 4.1.2)
#> fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.1.2)
#> fs 1.5.2 2021-12-08 [1] CRAN (R 4.1.2)
#> generics 0.1.2 2022-01-31 [1] CRAN (R 4.1.2)
#> glue 1.6.2 2022-02-24 [1] CRAN (R 4.1.2)
#> highr 0.9 2021-04-16 [1] CRAN (R 4.1.2)
#> hms * 1.1.1 2021-09-26 [1] CRAN (R 4.1.0)
#> htmltools 0.5.2 2021-08-25 [1] CRAN (R 4.1.0)
#> knitr 1.37 2021-12-16 [1] CRAN (R 4.1.2)
#> lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.1.0)
#> lubridate 1.8.0 2021-10-07 [1] CRAN (R 4.1.0)
#> magrittr 2.0.2 2022-01-26 [1] CRAN (R 4.1.2)
#> pillar 1.7.0 2022-02-01 [1] CRAN (R 4.1.2)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.1.2)
#> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.1.2)
#> reprex 2.0.1 2021-08-05 [1] CRAN (R 4.1.0)
#> rlang 1.0.2 2022-03-04 [1] CRAN (R 4.1.2)
#> rmarkdown 2.11 2021-09-14 [1] CRAN (R 4.1.0)
#> rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.1.2)
#> sessioninfo 1.2.2.9000 2022-03-16 [1] Github (r-lib/sessioninfo@27965c2)
#> stringi 1.7.6 2021-11-29 [1] CRAN (R 4.1.0)
#> stringr 1.4.0.9000 2021-12-14 [1] xgit ([email protected]:tidyverse/stringr.git@dd909b7)
#> tibble 3.1.6 2021-11-07 [1] CRAN (R 4.1.0)
#> tidyselect 1.1.2 2022-02-21 [1] CRAN (R 4.1.2)
#> tzdb 0.2.0 2021-10-27 [1] CRAN (R 4.1.0)
#> utf8 1.2.2 2021-07-24 [1] CRAN (R 4.1.0)
#> vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.1.2)
#> vroom 1.5.7 2021-11-30 [1] CRAN (R 4.1.0)
#> withr 2.5.0 2022-03-03 [1] CRAN (R 4.1.2)
#> xfun 0.30 2022-03-02 [1] CRAN (R 4.1.2)
#> yaml 2.3.5 2022-02-21 [1] CRAN (R 4.1.2)
#>
#> [1] /home/jupyter/.R/library
#> [2] /usr/local/lib/R/site-library
#> [3] /usr/lib/R/site-library
#> [4] /usr/lib/R/library
#>
#> ──────────────────────────────────────────────────────────────────────────────