Skip to content

Fractional seconds, col_datetime/col_time differences and %AT #422

@muschellij2

Description

@muschellij2

Data for example

I could have likely made easier example data sets, but this is what I was using at the time (they are public and open):

62163.csv.gz
62163_data.csv.gz
62163_data_separated.csv.gz

Description of Issue

Fractional seconds are truncated in col_time defaults, but not col_datetime defaults and the ask may be a warning if fractional seconds are truncated if col_time(format = "") is used for hms/time objects.


Reprex

This is the option needed to set to see fractional seconds when printing

options(digits.secs = 3)

Load hms so we can see things as times versus difftime objects

library(hms)

Example with Fractional Seconds

url = "https://github.com/r-lib/vroom/files/8353807/62163.csv.gz"
file = file.path(tempdir(), basename(url))
if (!file.exists(file)) {
  curl::curl_download(url, file)
}

Data Header

We see the fractional seconds in this file, which is needed for our analysis.

readLines(file, 5)
#> [1] "DAY_OF_DATA,START_TIME,END_TIME,DATA_QUALITY_FLAG_CODE,DATA_QUALITY_FLAG_VALUE"
#> [2] "2,14:11:35.463000,14:11:35.463000,COUNT_SPIKES_Z,1"                            
#> [3] "2,14:10:00.000000,14:10:59.988000,ADJACENT_INVALID,1"                          
#> [4] "2,14:12:00.000000,14:12:59.988000,ADJACENT_INVALID,1"                          
#> [5] "4,11:17:25.938000,11:17:25.938000,COUNT_SPIKES_Z,1"

Reading data with vroom

Here we read the data and see that the output is a col_time object

data = vroom::vroom(file, progress = FALSE)
#> Rows: 15 Columns: 5
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr  (1): DATA_QUALITY_FLAG_CODE
#> dbl  (2): DAY_OF_DATA, DATA_QUALITY_FLAG_VALUE
#> time (2): START_TIME, END_TIME
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
vroom::spec(data)
#> cols(
#>   DAY_OF_DATA = col_double(),
#>   START_TIME = col_time(format = ""),
#>   END_TIME = col_time(format = ""),
#>   DATA_QUALITY_FLAG_CODE = col_character(),
#>   DATA_QUALITY_FLAG_VALUE = col_double(),
#>   .delim = ","
#> )

No fractional seconds are printed:

head(data)
#> # A tibble: 6 × 5
#>   DAY_OF_DATA START_TIME END_TIME DATA_QUALITY_FLAG_CODE DATA_QUALITY_FLAG_VALUE
#>         <dbl> <time>     <time>   <chr>                                    <dbl>
#> 1           2 14:11:35   14:11:35 COUNT_SPIKES_Z                               1
#> 2           2 14:10:00   14:10:59 ADJACENT_INVALID                             1
#> 3           2 14:12:00   14:12:59 ADJACENT_INVALID                             1
#> 4           4 11:17:25   11:17:25 COUNT_SPIKES_Z                               1
#> 5           4 11:35:21   11:35:21 COUNT_SPIKES_X                               1
#> 6           4 11:16:00   11:16:59 ADJACENT_INVALID                             1

We can confirm that they are truncated.

as.numeric(lubridate::seconds(data$START_TIME[1:5])) %% 1
#> [1] 0 0 0 0 0
as.numeric(data$START_TIME[1:5]) %% 1
#> [1] 0 0 0 0 0

Different col_time format

The default %AT I don’t think takes into account fractional seconds, so we need to pass our own format in.
Here we specify the col_time so that it uses %OS, which I think is R-specific as per ?strptime

col_time_with_frac_secs = function(...) {
  vroom::col_time(format = "%H:%M:%OS", ...)
}

read in the data

data = vroom::vroom(file,
                   col_types =
                     vroom::cols(
                       START_TIME = col_time_with_frac_secs(),
                       END_TIME = col_time_with_frac_secs(),
                     ))

Fractional seconds are preserved

as.numeric(lubridate::seconds(data$START_TIME[1:5])) %% 1
#> [1] 0.463 0.000 0.000 0.938 0.800
as.numeric(data$START_TIME[1:5]) %% 1
#> [1] 0.463 0.000 0.000 0.938 0.800

Discussion of Issue/Resolution

Overall, that may be the end of it to be an issue to point to people in the future. I’m not sure if there should be warning that things may be truncated or whether this should/could guess fractional seconds. Below I just show that the default for a datetime does preserve fractional seconds, so this behavior is mildly inconsistent, and the format of how you store dates and times can lead to fractional second differences.


Example File with Datetime versus time object

If we have a file with a datetime, then this doesn’t seem to be an issue:

url = "https://github.com/r-lib/vroom/files/8353874/62163_data.csv.gz"
file = file.path(tempdir(), basename(url))
if (!file.exists(file)) {
  curl::curl_download(url, file)
}

Data Header

We see the fractional seconds in this file.

readLines(file, 5)
#> [1] "HEADER_TIMESTAMP,X,Y,Z"                    
#> [2] "2000-01-08 17:30:00.000,0.208,0.079,-0.751"
#> [3] "2000-01-08 17:30:00.013,0.17,0.094,-0.751" 
#> [4] "2000-01-08 17:30:00.025,0.22,0.109,-0.727" 
#> [5] "2000-01-08 17:30:00.038,0.258,0.047,-0.78"

Reading data with vroom

Here we read the data and see that the output is a col_datetime object

data = vroom::vroom(file, progress = FALSE)
#> Rows: 9 Columns: 4
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> dbl  (3): X, Y, Z
#> dttm (1): HEADER_TIMESTAMP
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
vroom::spec(data)
#> cols(
#>   HEADER_TIMESTAMP = col_datetime(format = ""),
#>   X = col_double(),
#>   Y = col_double(),
#>   Z = col_double(),
#>   .delim = ","
#> )

We see fractional seconds are preserved:

head(data)
#> # A tibble: 6 × 4
#>   HEADER_TIMESTAMP            X     Y      Z
#>   <dttm>                  <dbl> <dbl>  <dbl>
#> 1 2000-01-08 17:30:00.000 0.208 0.079 -0.751
#> 2 2000-01-08 17:30:00.013 0.17  0.094 -0.751
#> 3 2000-01-08 17:30:00.024 0.22  0.109 -0.727
#> 4 2000-01-08 17:30:00.037 0.258 0.047 -0.78 
#> 5 2000-01-08 17:30:00.049 0.276 0.029 -0.762
#> 6 2000-01-08 17:30:00.062 0.258 0.032 -0.777

We can confirm that they are preserved

as.numeric(lubridate::seconds(data$HEADER_TIMESTAMP[1:5])) %% 1
#> [1] 0.00000000 0.01300001 0.02499998 0.03799999 0.04999995
as.numeric(data$HEADER_TIMESTAMP[1:5]) %% 1
#> [1] 0.00000000 0.01300001 0.02499998 0.03799999 0.04999995

Example with separated Date and Time - similar issue

If we have a file with 2 columns, separated in date and time, where time has fractional seconds, then this is again an issue as the first example.

url = "https://github.com/r-lib/vroom/files/8353880/62163_data_separated.csv.gz"
file = file.path(tempdir(), basename(url))
if (!file.exists(file)) {
  curl::curl_download(url, file)
}

Data Header

We see the fractional seconds in this file.

readLines(file, 5)
#> [1] "DATE,TIMESTAMP,X,Y,Z"                      
#> [2] "2000-01-08,17:30:00.000,0.208,0.079,-0.751"
#> [3] "2000-01-08,17:30:00.013,0.17,0.094,-0.751" 
#> [4] "2000-01-08,17:30:00.025,0.22,0.109,-0.727" 
#> [5] "2000-01-08,17:30:00.038,0.258,0.047,-0.78"

Reading data with vroom

Here we read the data and see that the output is a col_time object

data = vroom::vroom(file, progress = FALSE)
#> Rows: 9 Columns: 5
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> dbl  (3): X, Y, Z
#> date (1): DATE
#> time (1): TIMESTAMP
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
vroom::spec(data)
#> cols(
#>   DATE = col_date(format = ""),
#>   TIMESTAMP = col_time(format = ""),
#>   X = col_double(),
#>   Y = col_double(),
#>   Z = col_double(),
#>   .delim = ","
#> )

No fractional seconds are printed:

head(data)
#> # A tibble: 6 × 5
#>   DATE       TIMESTAMP     X     Y      Z
#>   <date>     <time>    <dbl> <dbl>  <dbl>
#> 1 2000-01-08 17:30     0.208 0.079 -0.751
#> 2 2000-01-08 17:30     0.17  0.094 -0.751
#> 3 2000-01-08 17:30     0.22  0.109 -0.727
#> 4 2000-01-08 17:30     0.258 0.047 -0.78 
#> 5 2000-01-08 17:30     0.276 0.029 -0.762
#> 6 2000-01-08 17:30     0.258 0.032 -0.777

We can confirm that they are truncated.

as.numeric(lubridate::seconds(data$TIMESTAMP[1:5])) %% 1
#> [1] 0 0 0 0 0
as.numeric(data$TIMESTAMP[1:5]) %% 1
#> [1] 0 0 0 0 0

Created on 2022-03-25 by the reprex package (v2.0.1)

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.1.2 (2021-11-01)
#>  os       Debian GNU/Linux 10 (buster)
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language (EN)
#>  collate  C.UTF-8
#>  ctype    C.UTF-8
#>  tz       Etc/UTC
#>  date     2022-03-25
#>  pandoc   2.14.0.3 @ /usr/lib/rstudio-server/bin/pandoc/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version    date (UTC) lib source
#>  bit           4.0.4      2020-08-04 [2] CRAN (R 4.1.0)
#>  bit64         4.0.5      2020-08-30 [2] CRAN (R 4.1.0)
#>  cli           3.2.0.9000 2022-03-16 [1] Github (r-lib/cli@51463d2)
#>  crayon        1.5.0      2022-02-14 [1] CRAN (R 4.1.2)
#>  curl          4.3.2      2021-06-23 [1] CRAN (R 4.1.0)
#>  digest        0.6.29     2021-12-01 [1] CRAN (R 4.1.2)
#>  ellipsis      0.3.2      2021-04-29 [1] CRAN (R 4.1.2)
#>  evaluate      0.15       2022-02-18 [1] CRAN (R 4.1.2)
#>  fansi         1.0.2      2022-01-14 [1] CRAN (R 4.1.2)
#>  fastmap       1.1.0      2021-01-25 [1] CRAN (R 4.1.2)
#>  fs            1.5.2      2021-12-08 [1] CRAN (R 4.1.2)
#>  generics      0.1.2      2022-01-31 [1] CRAN (R 4.1.2)
#>  glue          1.6.2      2022-02-24 [1] CRAN (R 4.1.2)
#>  highr         0.9        2021-04-16 [1] CRAN (R 4.1.2)
#>  hms         * 1.1.1      2021-09-26 [1] CRAN (R 4.1.0)
#>  htmltools     0.5.2      2021-08-25 [1] CRAN (R 4.1.0)
#>  knitr         1.37       2021-12-16 [1] CRAN (R 4.1.2)
#>  lifecycle     1.0.1      2021-09-24 [1] CRAN (R 4.1.0)
#>  lubridate     1.8.0      2021-10-07 [1] CRAN (R 4.1.0)
#>  magrittr      2.0.2      2022-01-26 [1] CRAN (R 4.1.2)
#>  pillar        1.7.0      2022-02-01 [1] CRAN (R 4.1.2)
#>  pkgconfig     2.0.3      2019-09-22 [1] CRAN (R 4.1.2)
#>  purrr         0.3.4      2020-04-17 [1] CRAN (R 4.1.2)
#>  reprex        2.0.1      2021-08-05 [1] CRAN (R 4.1.0)
#>  rlang         1.0.2      2022-03-04 [1] CRAN (R 4.1.2)
#>  rmarkdown     2.11       2021-09-14 [1] CRAN (R 4.1.0)
#>  rstudioapi    0.13       2020-11-12 [1] CRAN (R 4.1.2)
#>  sessioninfo   1.2.2.9000 2022-03-16 [1] Github (r-lib/sessioninfo@27965c2)
#>  stringi       1.7.6      2021-11-29 [1] CRAN (R 4.1.0)
#>  stringr       1.4.0.9000 2021-12-14 [1] xgit ([email protected]:tidyverse/stringr.git@dd909b7)
#>  tibble        3.1.6      2021-11-07 [1] CRAN (R 4.1.0)
#>  tidyselect    1.1.2      2022-02-21 [1] CRAN (R 4.1.2)
#>  tzdb          0.2.0      2021-10-27 [1] CRAN (R 4.1.0)
#>  utf8          1.2.2      2021-07-24 [1] CRAN (R 4.1.0)
#>  vctrs         0.3.8      2021-04-29 [1] CRAN (R 4.1.2)
#>  vroom         1.5.7      2021-11-30 [1] CRAN (R 4.1.0)
#>  withr         2.5.0      2022-03-03 [1] CRAN (R 4.1.2)
#>  xfun          0.30       2022-03-02 [1] CRAN (R 4.1.2)
#>  yaml          2.3.5      2022-02-21 [1] CRAN (R 4.1.2)
#> 
#>  [1] /home/jupyter/.R/library
#>  [2] /usr/local/lib/R/site-library
#>  [3] /usr/lib/R/site-library
#>  [4] /usr/lib/R/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions