Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -646,6 +646,7 @@ export(vec_unique)
export(vec_unique_count)
export(vec_unique_loc)
export(vec_unrep)
export(vec_unstructure)
import(rlang)
importFrom(stats,median)
importFrom(stats,na.exclude)
Expand Down
8 changes: 8 additions & 0 deletions R/faq-developer.R
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,14 @@
#' @name reference-faq-compatibility
NULL

#' FAQ - Native storage types
#'
#' @includeRmd man/faq/developer/theory-native-storage.Rmd description
#'
#' @name theory-faq-native-storage
#' @aliases theory_faq_native_storage
NULL

#' FAQ - How does coercion work in vctrs?
#'
#' @includeRmd man/faq/developer/theory-coercion.Rmd description
Expand Down
103 changes: 103 additions & 0 deletions R/unstructure.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
#' Unstructure a vector
#'
#' @description
#' `vec_unstructure()` takes a vector that meets the [native storage
#' requirements][theory_faq_native_storage] of vctrs and removes all extraneous
#' attributes, retaining only those that are natively supported by vctrs. Only
#' the following attributes are retained:
#'
#' - For atomic vectors, `names`
#' - For arrays, `dim` and `dimnames[[1]]`, i.e. the row names
#' - For data frames, `names`, `row.names`, and a `class` of `"data.frame"`
#'
#' @details
#' Removing extraneous attributes is useful for avoiding unexpected side
#' effects, for example:
#'
#' - [vec_proxy()] calls `vec_unstructure()` on the proxy before returning it.
#' This ensures that internal manipulation of the proxy avoids any unexpected
#' S3 dispatch. Additionally, it means that the a future call to
#' [vec_restore()] receives a minimal `x` object to build upon.
#'
#' - When implementing S3 methods for generics like `+` or `is.finite()`, it is
#' often useful to `vec_unstructure()` your custom object to remove its class,
#' call the generic again on the native type to use base R's native
#' implementation, and then optionally regenerate your custom type with a
#' `new_<my_type>()` constructor (which you would likely do for `+`, but would
#' not do for `is.finite()`, which just returns a logical vector).
#'
#' `vec_unstructure()` is roughly the inverse of [base::structure()].
#'
#' @param x An object that meets vctrs's [native storage
#' requirements][theory_faq_native_storage].
#'
#' @export
#' @examples
#' # Atomic vectors without attributes are returned unmodified
#' vec_unstructure(1)
#'
#' # Atomic vectors with attributes are unstructured back to their natively
#' # supported form, only `names` are retained here:
#' x <- structure(1, names = "a", foo = "bar", class = "myclass")
#' x
#' vec_unstructure(x)
#'
#' # Arrays retain `dim` and `dimnames[[1]]` but all other attributes are lost
#' x <- array(1:4, c(2, 2))
#' rownames(x) <- c("a", "b")
#' colnames(x) <- c("c", "d")
#' attr(x, "foo") <- "bar"
#' x
#' vec_unstructure(x)
#'
#' # Data frames count as a native storage type in vctrs, so bare data frames
#' # are returned unmodified
#' x <- data_frame(x = 1:5, y = 6:10)
#' vec_unstructure(x)
#'
#' if (require("tibble")) {
#' # Tibbles meet the native storage requirement, but have extraneous
#' # attributes that are stripped away
#' x <- tibble(x = 1:5, y = 6:10)
#' x
#' vec_unstructure(x)
#' }
#'
#' # Note that native storage types are orthogonal to proxies.
#' # Calling `vec_unstructure()` on a rcrd returns the underlying list storage,
#' # while the proxy of this type (meant for C manipulation) is a data frame.
#' x <- new_rcrd(list(a = 1:5, b = 6:10))
#' vec_unstructure(x)
#' vec_proxy(x)
#'
#' # Types that don't meet the native storage requirements result in an error
#' try(vec_unstructure(NULL))
#' try(vec_unstructure(environment()))
vec_unstructure <- function(x) {
.Call(ffi_vec_unstructure, x)
}

# Thrown from C
stop_unsupported_storage_type <- function(x) {
# It currently doesn't feel worth it to add `x_arg` and `error_call` arguments
# to `vec_unstructure()`. This is a very low level function and the error is
# likely only going to be seen by developers implementing their packages.
x_arg <- glue::backtick("x")
error_call <- call("vec_unstructure")

message <- c(
cli::format_inline(
"{x_arg} must have a supported storage type, not <{typeof(x)}>."
),
i = cli::format_inline(paste(
"Read our FAQ about {.topic [native storage types](vctrs::theory_faq_native_storage)}",
"to learn more."
))
)

stop_vctrs(
message,
"vctrs_error_unsupported_storage_type",
call = error_call
)
}
1 change: 1 addition & 0 deletions _pkgdown.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ reference:
- vec_data
- vec_ptype
- vec_size
- vec_unstructure
- obj_is_vector
- obj_is_list

Expand Down
31 changes: 31 additions & 0 deletions man/faq/developer/theory-native-storage.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@

```{r, child = "../setup.Rmd", include = FALSE}
```

This page describes the native storage types supported by vctrs at C level. Every vctrs algorithm, like the one underlying `vec_match()`, is guaranteed to work with the native storage types.

Supported native storage types:

- Logical (`LGLSXP`)
- Integer (`INTSXP`)
- Double (`REALSXP`)
- Complex (`CPLSXP`)
- Raw (`RAWSXP`)
- Character (`STRSXP`)
- List (`VECSXP`)
- Arrays, denoted by the presence of a `dim` attribute and one of the above storage types.
- Data frames, denoted by the presence of `names` and `row.names` attributes, a `class` attribute of `"data.frame"`, and a storage type of list (`VECSXP`). Data frame columns may contain any storage type mentioned here, including arrays and additional data frames.

In addition to the required attributes mentioned above, the only other attribute that is natively supported by vctrs on these storage types is their names:

- For atomics, the `names` attribute
- For arrays, the `dimnames` attribute, but only row names are natively supported
- For data frames, the `row.names` attribute

## `vec_unstructure()`

`vec_unstructure()` takes an object that meets the native storage requirements and strips away all extraneous attributes, leaving behind only the attributes that are natively supported, as described above. This is often useful for avoiding unexpected S3 dispatch on classed objects from within vctrs algorithms.

## `vec_proxy()`

`vec_proxy()` takes an arbitrary vector supported by vctrs (as defined by `obj_is_vector()`) and returns an object that meets vctrs's native storage requirements. It is the job of the class author to define the mapping from custom vector class to a vctrs native storage type. If the custom vector builds on top of a native storage type, then a `vec_proxy()` method can return its input unmodified. The result of a `vec_proxy()` method is then further processed by `vec_unstructure()` to strip away extraneous attributes. This is then manipulated by the C level algorithms of vctrs, which support all native storage types.
59 changes: 59 additions & 0 deletions man/theory-faq-native-storage.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

81 changes: 81 additions & 0 deletions man/vec_unstructure.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

33 changes: 33 additions & 0 deletions src/decl/unstructure-decl.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
static inline
r_obj* atomic_unstructure(r_obj* x);

static inline
r_obj* array_unstructure(r_obj* x);

static inline
r_obj* df_unstructure(r_obj* x);

static inline
bool has_unstructured_atomic_attributes(r_obj* x);

static inline
bool has_unstructured_array_attributes(r_obj* x);

static inline
bool has_unstructured_array_dim_names(r_obj* x);

static inline
r_obj* dim_names_unstructure(r_obj* dim_names);

static inline
bool dim_names_are_unstructured(r_obj* dim_names);

static inline
bool has_unstructured_data_frame_attributes(r_obj* x);

static inline
bool has_unstructured_data_frame_class(r_obj* x);

static inline
r_no_return
void stop_unsupported_storage_type(r_obj* x);
2 changes: 2 additions & 0 deletions src/init.c
Original file line number Diff line number Diff line change
Expand Up @@ -173,6 +173,7 @@ extern r_obj* ffi_vec_replace_values(r_obj*, r_obj*, r_obj*, r_obj*, r_obj*, r_o
extern r_obj* ffi_vec_if_else(r_obj*, r_obj*, r_obj*, r_obj*, r_obj*, r_obj*);
extern r_obj* ffi_vec_pany(r_obj*, r_obj*, r_obj*, r_obj*);
extern r_obj* ffi_vec_pall(r_obj*, r_obj*, r_obj*, r_obj*);
extern r_obj* ffi_vec_unstructure(r_obj*);


// Maturing
Expand Down Expand Up @@ -383,6 +384,7 @@ static const R_CallMethodDef CallEntries[] = {
{"ffi_vec_if_else", (DL_FUNC) &ffi_vec_if_else, 6},
{"ffi_vec_pany", (DL_FUNC) &ffi_vec_pany, 4},
{"ffi_vec_pall", (DL_FUNC) &ffi_vec_pall, 4},
{"ffi_vec_unstructure", (DL_FUNC) &ffi_vec_unstructure, 1},
{"ffi_exp_vec_cast", (DL_FUNC) &exp_vec_cast, 2},
{NULL, NULL, 0}
};
Expand Down
3 changes: 2 additions & 1 deletion src/size.c
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,8 @@ r_ssize df_rownames_size(r_obj* x) {
continue;
}

return rownames_size(r_node_car(attr));
r_obj* rn = r_node_car(attr);
return rownames_size(rn, rownames_type(rn));
}

return -1;
Expand Down
4 changes: 2 additions & 2 deletions src/type-data-frame.c
Original file line number Diff line number Diff line change
Expand Up @@ -420,8 +420,8 @@ r_ssize compact_rownames_length(r_obj* x) {
}

// [[ include("type-data-frame.h") ]]
r_ssize rownames_size(r_obj* rn) {
switch (rownames_type(rn)) {
r_ssize rownames_size(r_obj* rn, enum rownames_type type) {
switch (type) {
case ROWNAMES_TYPE_identifiers:
case ROWNAMES_TYPE_automatic:
return r_length(rn);
Expand Down
2 changes: 1 addition & 1 deletion src/type-data-frame.h
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ enum rownames_type {
ROWNAMES_TYPE_identifiers
};
enum rownames_type rownames_type(r_obj* rn);
r_ssize rownames_size(r_obj* rn);
r_ssize rownames_size(r_obj* rn, enum rownames_type type);

r_obj* df_ptype2(
r_obj* x,
Expand Down
Loading