Skip to content

Combine x and ... in min, max, range #1946

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

brookslogan
Copy link

Resolves #1372.

Only call vec_c(x, ...) if dots are nonempty, in order to stay fast in the empty case.

@brookslogan
Copy link
Author

Sorry for the rewrite, fixed some commit metadata.

@brookslogan
Copy link
Author

brookslogan commented Aug 3, 2024

Some notes:

  • I think the main use case here is taking the min or max of two [or maybe more] size-1 vectors. If it's okay to deviate from mirroring base R behavior a bit, it would probably help to add some warnings or errors if ... is used in other contexts, such as when it seems like pmin or pmax might be desired instead. Happy to make these changes; please let me know if they belong.
  • The current check failure ("! Could not solve package dependencies:") doesn't seem like it's related to the changes in this PR.
  • There doesn't seem to be a notable performance impact with the current approach, but the empty-dots optimization does seem important.
Small-vctr benchmarks:
# main
x <- new_vctr(c(1,2,4)); bench::mark(min(x), min_iterations = 1e5, max_iterations = 1e5)
#> # A tibble: 1 × 13
#>   expression      min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result    
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list>    
#> 1 min(x)       50.6µs   55.9µs    17597.        NA     8.98 99949    51      5.68s <vctrs_vc>
#> # ℹ 3 more variables: memory <list>, time <list>, gc <list>

# this PR
x <- new_vctr(c(1,2,4)); bench::mark(min(x), min_iterations = 1e5, max_iterations = 1e5)
#> # A tibble: 1 × 13
#>   expression      min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result    
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list>    
#> 1 min(x)       53.3µs   55.9µs    17608.        NA     8.98 99949    51      5.68s <vctrs_vc>
#> # ℹ 3 more variables: memory <list>, time <list>, gc <list>

# Without if-empty-dots optimization:
x <- new_vctr(c(1,2,4)); bench::mark(min(x), min_iterations = 1e5, max_iterations = 1e5)
#> # A tibble: 1 × 13
#>   expression      min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result    
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list>    
#> 1 min(x)        187µs    207µs     4737.        NA     10.4 99780   220      21.1s <vctrs_vc>
#> # ℹ 3 more variables: memory <list>, time <list>, gc <list>

# Testing with short integer vectors (ALTREP seqs or not) also yielded comparable performance.

@brookslogan
Copy link
Author

brookslogan commented Mar 31, 2025

I think the main use case here is taking the min or max of two [or maybe more] size-1 vectors. If it's okay to deviate from mirroring base R behavior a bit, it would probably help to add some warnings or errors if ... is used in other contexts, such as when it seems like pmin or pmax might be desired instead. Happy to make these changes; please let me know if they belong.

I've gone ahead and implemented these checks for min and max and made them produce hard errors. I didn't do this for range, as I realized that the range generic just has ... rather than x, ... [and more importantly doesn't have a parallel/vectorized version in base], so I haven't made it produce errors like this, but have made sure that it combines the first of the ... args (which in the range.vctrs_vctr implementation is called x) and any remaining ... args, if necessary.

Only call `vec_c` if dots are nonempty to stay fast in the empty case.
The `range` generic doesn't separate out `x` and `...`, but splitting them out
lets us easily check whether we can skip a potentially-slow `vec_c` operation to
combine them.
@brookslogan
Copy link
Author

brookslogan commented Mar 31, 2025

Rebased onto current main to see if it will help with what seemed like unrelated failing checks. Coverage check still aborts immediately with

Error: Missing download info for actions/upload-artifact@v3

and I'm not sure how to resolve this.

[(CHECK issues seem unrelated, so think this might still be in a mergeable state

Warning: Missing link(s) in Rd file 'new_rcrd.Rd':
[...]
Non-standard file/directory found at top level:
‘pkgdown’
[...]
File ‘vctrs/libs/vctrs.so’:
[...]
Compiled code should not call non-API entry points in R.

)]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Additional arguments in min() and max() methods for vctrs_vctr
1 participant