-
Notifications
You must be signed in to change notification settings - Fork 72
Try just SEXP, not struct str_info
#2119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
DavisVaughan
wants to merge
1
commit into
feature/no-truelength-order
from
feature/no-truelength-order-less-memory
Closed
Try just SEXP, not struct str_info
#2119
DavisVaughan
wants to merge
1
commit into
feature/no-truelength-order
from
feature/no-truelength-order-less-memory
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
It has a high memory cost (24 extra bytes per string over just `SEXP`) and it isn't clear that it gains us that much unless there is a high number of passes, i.e. in the case of a long common prefix, which is somewhat rare
Member
Author
|
As shown by some of the "weird scenarios", not caching the pointers/sizes and calling # "weird" scenarios
# 100 unique strings each with a suffix 10-20 characters in length,
# total size of 10,000,000
# 10 long common prefixes are pasted on the front, which makes more unique
# strings (typically around 1000) and creates strings with long prefixes that
# we try and "skip" through
#
# This is where you really feel the pain of not having cached sizes and pointers
# and is probably enough to make me want them again
cross::bench_versions(
pkgs = c(
"r-lib/vctrs",
"r-lib/vctrs@feature/no-truelength-order",
"r-lib/vctrs@feature/no-truelength-order-split-info",
"r-lib/vctrs@feature/no-truelength-order-split-info-and-bool-nas",
"r-lib/vctrs@feature/no-truelength-order-split-info-and-bool-nas-and-early-na-handling",
"r-lib/vctrs@feature/no-truelength-order-split-info-and-bool-nas-and-early-na-handling-and-no-sizes",
"r-lib/vctrs@feature/no-truelength-order-less-memory"
),
{
set.seed(123)
total_size <- 10000000
n_unique <- 100
min_length <- 10
max_length <- 20
n_prefixes <- 10
prefix_size <- 20
w <- sample(
stringi::stri_rand_strings(
n = n_unique,
length = sample(seq(min_length, max_length), n_unique, replace = TRUE)
),
size = total_size,
replace = TRUE
)
prefixes <- stringi::stri_rand_strings(
n = n_prefixes,
length = prefix_size
)
prefixes <- sample(prefixes, size = total_size, replace = TRUE)
w <- paste0(prefixes, w)
bench::mark(
vctrs:::vec_order_radix(w),
iterations = 20
)
}
)
#> # A tibble: 7 × 7
#> pkg expression min median `itr/sec` mem_alloc `gc/sec`
#> <chr> <bch:expr> <bch:t> <bch:t> <dbl> <bch:byt> <dbl>
#> 1 r-lib/vctrs vctrs:::v… 61.2ms 61.6ms 16.2 124MB 64.9
#> 2 r-lib/vctrs@feature/n… vctrs:::v… 268.2ms 276.4ms 3.55 544MB 4.08
#> 3 r-lib/vctrs@feature/n… vctrs:::v… 284.2ms 289.2ms 3.39 467MB 3.90
#> 4 r-lib/vctrs@feature/n… vctrs:::v… 266.2ms 272.7ms 3.55 334MB 4.96
#> 5 r-lib/vctrs@feature/n… vctrs:::v… 260.4ms 266.9ms 3.61 315MB 5.05
#> 6 r-lib/vctrs@feature/n… vctrs:::v… 220.6ms 227.4ms 4.37 238MB 4.59
#> 7 r-lib/vctrs@feature/n… vctrs:::v… 726.2ms 730.2ms 1.37 238MB 9.13
# - 100 unique strings each with a suffix 10-100 characters in length,
# total size of 10,000,000
# 10 long common prefixes are pasted on the front, which makes more unique
# strings (typically around 1000) and creates strings with long prefixes that
# we try and "skip" through
cross::bench_versions(
pkgs = c(
"r-lib/vctrs",
"r-lib/vctrs@feature/no-truelength-order",
"r-lib/vctrs@feature/no-truelength-order-split-info",
"r-lib/vctrs@feature/no-truelength-order-split-info-and-bool-nas",
"r-lib/vctrs@feature/no-truelength-order-split-info-and-bool-nas-and-early-na-handling",
"r-lib/vctrs@feature/no-truelength-order-split-info-and-bool-nas-and-early-na-handling-and-no-sizes",
"r-lib/vctrs@feature/no-truelength-order-less-memory"
),
{
set.seed(123)
total_size <- 10000000
n_unique <- 100
min_length <- 10
max_length <- 100
n_prefixes <- 10
prefix_size <- 20
w <- sample(
stringi::stri_rand_strings(
n = n_unique,
length = sample(seq(min_length, max_length), n_unique, replace = TRUE)
),
size = total_size,
replace = TRUE
)
prefixes <- stringi::stri_rand_strings(
n = n_prefixes,
length = prefix_size
)
prefixes <- sample(prefixes, size = total_size, replace = TRUE)
w <- paste0(prefixes, w)
bench::mark(
vctrs:::vec_order_radix(w),
iterations = 20
)
}
)
#> # A tibble: 7 × 7
#> pkg expression min median `itr/sec` mem_alloc `gc/sec`
#> <chr> <bch:expr> <bch:t> <bch:t> <dbl> <bch:byt> <dbl>
#> 1 r-lib/vctrs vctrs:::v… 61.5ms 62.3ms 16.1 124MB 64.2
#> 2 r-lib/vctrs@feature/n… vctrs:::v… 255.8ms 259.6ms 3.79 544MB 4.35
#> 3 r-lib/vctrs@feature/n… vctrs:::v… 266.7ms 269.8ms 3.62 467MB 4.16
#> 4 r-lib/vctrs@feature/n… vctrs:::v… 246.8ms 252ms 3.82 334MB 5.34
#> 5 r-lib/vctrs@feature/n… vctrs:::v… 244.3ms 253.6ms 3.85 315MB 5.38
#> 6 r-lib/vctrs@feature/n… vctrs:::v… 203.5ms 208.3ms 4.74 238MB 4.98
#> 7 r-lib/vctrs@feature/n… vctrs:::v… 714.1ms 718.3ms 1.39 238MB 9.28
# 1,000,000 unique strings each with a suffix 10-20 characters in length,
# total size of 10,000,000
# single common prefix
cross::bench_versions(
pkgs = c(
"r-lib/vctrs",
"r-lib/vctrs@feature/no-truelength-order",
"r-lib/vctrs@feature/no-truelength-order-split-info",
"r-lib/vctrs@feature/no-truelength-order-split-info-and-bool-nas",
"r-lib/vctrs@feature/no-truelength-order-split-info-and-bool-nas-and-early-na-handling",
"r-lib/vctrs@feature/no-truelength-order-split-info-and-bool-nas-and-early-na-handling-and-no-sizes",
"r-lib/vctrs@feature/no-truelength-order-less-memory"
),
{
set.seed(123)
total_size <- 10000000
n_unique <- 1000000
min_length <- 10
max_length <- 20
n_prefixes <- 1
prefix_size <- 20
w <- sample(
stringi::stri_rand_strings(
n = n_unique,
length = sample(seq(min_length, max_length), n_unique, replace = TRUE)
),
size = total_size,
replace = TRUE
)
prefixes <- stringi::stri_rand_strings(
n = n_prefixes,
length = prefix_size
)
prefixes <- sample(prefixes, size = total_size, replace = TRUE)
w <- paste0(prefixes, w)
bench::mark(
vctrs:::vec_order_radix(w),
iterations = 20
)
}
)
#> # A tibble: 7 × 7
#> pkg expression min median `itr/sec` mem_alloc `gc/sec`
#> <chr> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 r-lib/vctrs vctrs:::v… 481.69ms 484.85ms 1.97 277MB 2.17
#> 2 r-lib/vctrs@feature… vctrs:::v… 1.67s 1.68s 0.592 544MB 0.621
#> 3 r-lib/vctrs@feature… vctrs:::v… 1.72s 1.75s 0.558 467MB 0.754
#> 4 r-lib/vctrs@feature… vctrs:::v… 1.68s 1.69s 0.592 334MB 6.80
#> 5 r-lib/vctrs@feature… vctrs:::v… 1.74s 1.75s 0.572 315MB 6.58
#> 6 r-lib/vctrs@feature… vctrs:::v… 1.26s 1.28s 0.777 238MB 0.816
#> 7 r-lib/vctrs@feature… vctrs:::v… 3.66s 4.33s 0.245 238MB 0.257 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
It has a high memory cost (24 extra bytes per string over just
SEXP) and it isn't clear that it gains us that much unless there is a high number of passes, i.e. in the case of a long common prefix, which is somewhat rare