Skip to content
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 8 additions & 1 deletion MAINTENANCE.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,6 @@ The only code path that is multi-threaded is normal files, connections are read
To compile with logging enabled you need to set `-DVROOM_LOG` in your `~/R/Makevars` and if you want to control the logging level you can set `-DSPDLOG_ACTIVE_LEVEL=SPDLOG_LEVEL_DEBUG`.
You also need to create the `logs` directory for the logs to be written to. They will write a `logs/index.idx` and a `logs/index_connection.idx` file respectively.
The file is appended to, not rewritten, so you would need to delete it if you want a new file after a run.
There is also `-DVROOM_USE_CONNECTIONS_API` to use the CRAN forbidden connections API directly, but the performance difference is generally the same, so it isn't really needed.

## Known outstanding issues

Expand Down Expand Up @@ -94,6 +93,14 @@ https://github.com/r-lib/vroom/issues/357 tracks this issue
The following projects have merit, but would require more maintainer bandwidth than is currently available.
It is helpful to us to (a) record them explicitly for some possible future and (b) get them out of our open issues.

### Using the Connections API directly

Early in vroom's development, there was the notion that vroom might use the connections API directly.
But instead of that API becoming more official/public, the opposite happened and it's considered forbidden for a CRAN package to use it.
In January 2026 in #595, the remaining scaffolding around this was removed, as it had gone unexercised for years.

### Other ideas

* https://github.com/tidyverse/vroom/issues/186
* https://github.com/tidyverse/vroom/issues/151
* https://github.com/tidyverse/vroom/issues/250
Expand Down
4 changes: 0 additions & 4 deletions R/col_types.R
Original file line number Diff line number Diff line change
Expand Up @@ -624,15 +624,11 @@ collector_value.collector_factor <- function(x, ...) {
factor()
}

# the more obvious as.POSIXct(double()) doesn't work on R < 4.0
# https://github.com/tidyverse/vroom/issues/453
#' @export
collector_value.collector_datetime <- function(x, ...) {
vctrs::vec_ptype(Sys.time())
}

# the more obvious as.Date(double()) doesn't work on R < 4.0
# and again: https://github.com/tidyverse/vroom/issues/453
#' @export
collector_value.collector_date <- function(x, ...) {
vctrs::vec_ptype(Sys.Date())
Expand Down
24 changes: 11 additions & 13 deletions R/vroom.R
Original file line number Diff line number Diff line change
Expand Up @@ -479,9 +479,7 @@ vroom_tempfile <- function() {
#'
#' Alternatively there is also a family of environment variables to control use of
#' the Altrep framework. These can then be set in your `.Renviron` file, e.g.
#' with `usethis::edit_r_environ()`. For versions of R where the Altrep
#' framework is unavailable (R < 3.5.0) they are automatically turned off and
#' the variables have no effect. The variables can take one of `true`, `false`,
#' with `usethis::edit_r_environ()`. The variables can take one of `true`, `false`,
#' `TRUE`, `FALSE`, `1`, or `0`.
#'
#' - `VROOM_USE_ALTREP_NUMERICS` - If set use Altrep for _all_ numeric types
Expand Down Expand Up @@ -526,16 +524,16 @@ vroom_altrep <- function(which = NULL) {
}

args <- list(
getRversion() >= "3.5.0" && which$chr %||% vroom_use_altrep_chr(),
getRversion() >= "3.5.0" && which$fct %||% vroom_use_altrep_fct(),
getRversion() >= "3.5.0" && which$int %||% vroom_use_altrep_int(),
getRversion() >= "3.5.0" && which$dbl %||% vroom_use_altrep_dbl(),
getRversion() >= "3.5.0" && which$num %||% vroom_use_altrep_num(),
getRversion() >= "3.6.0" && which$lgl %||% vroom_use_altrep_lgl(), # logicals only supported in R 3.6.0+
getRversion() >= "3.5.0" && which$dttm %||% vroom_use_altrep_dttm(),
getRversion() >= "3.5.0" && which$date %||% vroom_use_altrep_date(),
getRversion() >= "3.5.0" && which$time %||% vroom_use_altrep_time(),
getRversion() >= "3.5.0" && which$big_int %||% vroom_use_altrep_big_int()
which$chr %||% vroom_use_altrep_chr(),
which$fct %||% vroom_use_altrep_fct(),
which$int %||% vroom_use_altrep_int(),
which$dbl %||% vroom_use_altrep_dbl(),
which$num %||% vroom_use_altrep_num(),
which$lgl %||% vroom_use_altrep_lgl(),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vroom actually does not ever use altrep for logical columns, but I gather there must have been some thought that it might one day? Anyway, I'm not getting into that here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably the fact that it didn't exist in 3.5 delayed its implementation, and then it was never added later on?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah? But I also think the performance argument is so different (less compelling) for logical. I'd at least want to look into that quantitatively before ever considering adding that feature.

which$dttm %||% vroom_use_altrep_dttm(),
which$date %||% vroom_use_altrep_date(),
which$time %||% vroom_use_altrep_time(),
which$big_int %||% vroom_use_altrep_big_int()
)

out <- 0L
Expand Down
16 changes: 3 additions & 13 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -218,12 +218,10 @@ need to be set by most users.
- `VROOM_WRITE_BUFFER_LINES` - The number of lines to use for each buffer when
writing files (default: 1000).

There are also a family of variables to control use of the Altrep framework.
For versions of R where the Altrep framework is unavailable (R < 3.5.0) they
are automatically turned off and the variables have no effect. The variables
can take one of `true`, `false`, `TRUE`, `FALSE`, `1`, or `0`.
There is also a family of variables to control use of the Altrep framework.
These variables can take one of these values: `true`, `false`, `TRUE`, `FALSE`, `1`, or `0`.

- `VROOM_USE_ALTREP_NUMERICS` - If set use Altrep for _all_ numeric types
- `VROOM_USE_ALTREP_NUMERICS` - If true, use Altrep for _all_ numeric types
(default `false`).

There are also individual variables for each type. Currently only
Expand All @@ -240,14 +238,6 @@ There are also individual variables for each type. Currently only
- `VROOM_USE_ALTREP_DATE`
- `VROOM_USE_ALTREP_TIME`

## RStudio caveats

RStudio's environment pane calls `object.size()` when it refreshes the pane, which
for Altrep objects can be extremely slow. RStudio 1.2.1335+ includes the fixes
([RStudio#4210](https://github.com/rstudio/rstudio/pull/4210),
[RStudio#4292](https://github.com/rstudio/rstudio/pull/4292)) for this issue,
so it is recommended you use at least that version.

## Thanks

- [Gabe Becker](https://github.com/gmbecker), [Luke
Expand Down
19 changes: 4 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -213,13 +213,11 @@ will not need to be set by most users.
- `VROOM_WRITE_BUFFER_LINES` - The number of lines to use for each
buffer when writing files (default: 1000).

There are also a family of variables to control use of the Altrep
framework. For versions of R where the Altrep framework is unavailable
(R \< 3.5.0) they are automatically turned off and the variables have no
effect. The variables can take one of `true`, `false`, `TRUE`, `FALSE`,
`1`, or `0`.
There is also a family of variables to control use of the Altrep
framework. These variables can take one of these values: `true`,
`false`, `TRUE`, `FALSE`, `1`, or `0`.

- `VROOM_USE_ALTREP_NUMERICS` - If set use Altrep for *all* numeric
- `VROOM_USE_ALTREP_NUMERICS` - If true, use Altrep for *all* numeric
types (default `false`).

There are also individual variables for each type. Currently only
Expand All @@ -236,15 +234,6 @@ There are also individual variables for each type. Currently only
- `VROOM_USE_ALTREP_DATE`
- `VROOM_USE_ALTREP_TIME`

## RStudio caveats

RStudio’s environment pane calls `object.size()` when it refreshes the
pane, which for Altrep objects can be extremely slow. RStudio 1.2.1335+
includes the fixes
([RStudio#4210](https://github.com/rstudio/rstudio/pull/4210),
[RStudio#4292](https://github.com/rstudio/rstudio/pull/4292)) for this
issue, so it is recommended you use at least that version.

Comment on lines -239 to -247
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This now feels like ancient history that need not be mentioned.

## Thanks

- [Gabe Becker](https://github.com/gmbecker), [Luke
Expand Down
4 changes: 1 addition & 3 deletions man/vroom_altrep.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

21 changes: 1 addition & 20 deletions src/altrep.cc
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,6 @@
#include <thread>

[[cpp11::register]] void force_materialization(SEXP x) {
#ifdef HAS_ALTREP
// Note: vroom_lgl has no ALTREP implementation, so not included
if (R_altrep_inherits(x, vroom_chr::class_t)) {
vroom_chr::Materialize(x);
Expand All @@ -36,11 +35,9 @@
} else if (R_altrep_inherits(x, vroom_big_int::class_t)) {
vroom_big_int::Materialize(x);
}
#endif
}

bool vroom_altrep(SEXP x) {
#ifdef HAS_ALTREP
return R_altrep_inherits(x, vroom_chr::class_t) ||
R_altrep_inherits(x, vroom_date::class_t) ||
R_altrep_inherits(x, vroom_dbl::class_t) ||
Expand All @@ -51,13 +48,9 @@ bool vroom_altrep(SEXP x) {
R_altrep_inherits(x, vroom_num::class_t) ||
R_altrep_inherits(x, vroom_time::class_t) ||
R_altrep_inherits(x, vroom_big_int::class_t);
#else
return false;
#endif
}

[[cpp11::register]] SEXP vroom_materialize(SEXP x, bool replace) {
#ifdef HAS_ALTREP
for (R_xlen_t col = 0; col < Rf_xlength(x); ++col) {
SEXP elt = VECTOR_ELT(x, col);
if (vroom_altrep(elt)) {
Expand All @@ -78,13 +71,10 @@ bool vroom_altrep(SEXP x) {
}
}

#endif

return x;
}

[[cpp11::register]] SEXP vroom_convert(SEXP x) {
#ifdef HAS_ALTREP
SEXP out = PROTECT(Rf_allocVector(VECSXP, Rf_xlength(x)));
SHALLOW_DUPLICATE_ATTRIB(out, x);

Expand Down Expand Up @@ -137,15 +127,11 @@ bool vroom_altrep(SEXP x) {
}
UNPROTECT(1);
return out;
#else
return x;
#endif
}

[[cpp11::register]] std::string vroom_str_(const cpp11::sexp& x) {
std::stringstream ss;

#ifdef HAS_ALTREP
if (ALTREP(x)) {

auto csym = CAR(ATTRIB(ALTREP_CLASS(x)));
Expand All @@ -160,12 +146,7 @@ bool vroom_altrep(SEXP x) {
ss << '\t' << "length:" << LENGTH(x);
}
ss << '\t' << "materialized:" << materialzied << '\n';
}
#else
if (false) {
}
#endif
else {
} else {
ss << std::boolalpha << "altrep:" << false << '\t'
<< "type: " << Rf_type2char(TYPEOF(x));
if (!Rf_isObject(x)) {
Expand Down
36 changes: 0 additions & 36 deletions src/altrep.h
Original file line number Diff line number Diff line change
Expand Up @@ -2,47 +2,11 @@

#include <R_ext/Rdynload.h>

#if R_VERSION >= R_Version(3, 5, 0)
#define HAS_ALTREP
#endif

#ifdef HAS_ALTREP
#if R_VERSION < R_Version(3, 6, 0)

// workaround because R's <R_ext/Altrep.h> not so conveniently uses `class`
// as a variable name, and C++ is not happy about that
//
// SEXP R_new_altrep(R_altrep_class_t class, SEXP data1, SEXP data2);
//

// clang-format off
#ifdef __clang__
# pragma clang diagnostic push
# pragma clang diagnostic ignored "-Wkeyword-macro"
#define class klass
# pragma clang diagnostic pop
#else
#define class klass
#endif
// clang-format on

// Because functions declared in <R_ext/Altrep.h> have C linkage
extern "C" {
#include <R_ext/Altrep.h>
}

// undo the workaround
#undef class

#else
extern "C" {
#include <R_ext/Altrep.h>
}
#endif

// Backport DATAPTR_RW for R < 4.6.0 (as recommended in Writing R Extensions)
#if R_VERSION < R_Version(4, 6, 0)
#define DATAPTR_RW(x) DATAPTR(x)
#endif

#endif
Loading