Skip to content

vec_ptype() should not retain attributes on atomics #2025

@DavisVaughan

Description

@DavisVaughan

Consider the following:

library(vctrs)

x <- structure(1, foo = "bar")
y <- 2

# This should not retain `foo`
vec_ptype(x)
#> numeric(0)
#> attr(,"foo")
#> [1] "bar"

# Note how the attributes are dropped as soon as you introduce
# a `ptype2` combination
vec_ptype2(x, y)
#> numeric(0)

# This currently results in the following inconsistency.
# The "common type" is just `numeric()` so no attribute should be retained.
vec_c(x)
#> [1] 1
#> attr(,"foo")
#> [1] "bar"
vec_c(x, y)
#> [1] 1 2

We preserve attributes in vec_ptype() here in vec_ptype_slice():

vctrs/src/ptype.c

Lines 13 to 44 in b63a233

r_obj* vec_ptype(r_obj* x, struct vctrs_arg* x_arg, struct r_lazy call) {
switch (vec_typeof(x)) {
case VCTRS_TYPE_null: return r_null;
case VCTRS_TYPE_unspecified: return vctrs_shared_empty_uns;
case VCTRS_TYPE_logical: return vec_ptype_slice(x, r_globals.empty_lgl);
case VCTRS_TYPE_integer: return vec_ptype_slice(x, r_globals.empty_int);
case VCTRS_TYPE_double: return vec_ptype_slice(x, r_globals.empty_dbl);
case VCTRS_TYPE_complex: return vec_ptype_slice(x, r_globals.empty_cpl);
case VCTRS_TYPE_character: return vec_ptype_slice(x, r_globals.empty_chr);
case VCTRS_TYPE_raw: return vec_ptype_slice(x, r_globals.empty_raw);
case VCTRS_TYPE_list: return vec_ptype_slice(x, r_globals.empty_list);
case VCTRS_TYPE_dataframe: return df_ptype(x, true);
case VCTRS_TYPE_s3: return s3_ptype(x, x_arg, call);
case VCTRS_TYPE_scalar: stop_scalar_type(x, x_arg, call);
}
r_stop_unreachable();
}
static
r_obj* col_ptype(r_obj* x) {
return vec_ptype(x, vec_args.empty, r_lazy_null);
}
static inline
r_obj* vec_ptype_slice(r_obj* x, r_obj* empty) {
if (r_attrib(x) == r_null) {
return empty;
} else {
// Slicing preserves attributes
return vec_slice(x, r_null);
}
}

I think we need to retain shape (for arrays) but should not retain any other attributes here.

This would affect the following tests

# This would just be `dbl()` rather than `named(dbl())` which I think is right.
# This supports our general view that names aren't part of the ptype.

# This might change in the future if we decide that prototypes don't
# have names
test_that("vec_ptype() preserves type of names and row names", {
  expect_identical(vec_ptype(c(foo = 1)), named(dbl()))
  expect_identical(vec_ptype(mtcars), mtcars[0, ])
  expect_identical(vec_ptype(foobar(mtcars)), foobar(mtcars[0, ]))
})
# I think we should consider that `structure(FALSE, foo = "bar")` and `FALSE` 
# ARE the same type. i.e. extraneous attributes on an object don't affect its
# ptype, so should not affect this computation.
#
# We certainly declare that `vec_ptype2(structure(1, foo = "bar"), 2)` has a common
# type of `numeric()`, so I'm not sure why this is any different

test_that("attributes of unclassed vectors are asserted", {
  x <- structure(FALSE, foo = "bar")
  y <- structure(TRUE, foo = "bar")
  expect_false(vec_is(x, FALSE))
  expect_false(vec_is(FALSE, x))
  expect_true(vec_is(y, x))
  expect_true(vec_is(x, y))
})

This is how this came up for me

x <- c(1, 2, 3)
from <- c(2, 3)

# `foo` is extraneous, this is just a double vector
to <- structure(c(0, -1), foo = "bar")

# TODO: Ideally the attributes wouldn't show up on the output, but
# `list_combine()` doesn't clear them because `vec_ptype(to)` retains
# them
vec_recode_values(x, from = from, to = to)
#> [1] NA  0 -1
#> attr(,"foo")
#> [1] "bar"

# As soon as you introduce ptype2, attributes drop off
vec_recode_values(x, from = from, to = to, default = 0)
#> [1]  0  0 -1

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions