Use base implementation of zstd to compress/decompress data by Bisaloo · Pull Request #32 · Huber-group-EMBL/Rarr

Bisaloo · 2025-08-26T15:23:06Z

No description provided.

Bisaloo · 2025-08-26T15:59:39Z

Couple of observations:

we still need the bundled zstd library as it is used by blosc
?memCompress() says (emphasis mine):

zstd compression was introduced in R 4.5.0: it is an optional part of the R build and currently uses compression level 3 which gives a good compression ratio vs compression speed trade-off.

Bisaloo · 2025-08-26T17:52:31Z

https://github.com/r-devel/r-svn/blob/e3d94e481f3879738922daa79a3fcb3b1d8e45c2/m4/R.m4#L3818-L3859

Keep an eye on this comment specifically:

https://github.com/r-devel/r-svn/blob/e3d94e481f3879738922daa79a3fcb3b1d8e45c2/m4/R.m4#L3855

Bisaloo · 2025-09-26T11:59:35Z

base implementation only allows compression level up to 19

github-actions · 2025-10-01T17:40:08Z

This is how benchmark results would change (along with a 95% confidence interval in relative change) if 06c7f8f is merged into devel:

❗🐌read_zstd: 447ms -> 682ms [+51.95%, +53.17%]
❗🐌write_zstd: 731ms -> 12s [+1532.48%, +1553.16%]
Further explanation regarding interpretation and methodology can be found in the documentation.

github-actions · 2025-11-10T22:25:19Z

This is how benchmark results would change (along with a 95% confidence interval in relative change) if 1604a41 is merged into devel:

✔️read_boolean: 10.1ms -> 10.1ms [-1%, +0.51%]
✔️read_double: 20.6ms -> 20.6ms [-0.72%, +0.29%]
✔️read_float16: 13.4ms -> 13.3ms [-2.66%, +1.68%]
✔️read_float32: 13.3ms -> 13.3ms [-1.01%, +1.05%]
✔️read_int16: 13.2ms -> 13.2ms [-0.1%, +1.11%]
✔️read_int32: 20.5ms -> 20.5ms [-0.98%, +0.49%]
✔️read_int64: 13.4ms -> 13.4ms [-0.53%, +1.17%]
✔️read_int8: 13.3ms -> 13.2ms [-1.27%, +0.14%]
✔️read_uint32: 13.9ms -> 14ms [-0.11%, +1.3%]
✔️read_uint64: 13.5ms -> 13.5ms [-1.15%, +0.2%]
✔️read_unicode: 8.58ms -> 8.62ms [-0.39%, +1.35%]
❗🐌read_zstd: 371ms -> 386ms [+3.45%, +4.51%]
✔️write_double: 18.8ms -> 18.8ms [-2.36%, +2.06%]
✔️write_int32: 18.7ms -> 18.7ms [-1.76%, +1.9%]
✔️write_string: 10.6ms -> 10.5ms [-0.9%, +0.78%]
❗🐌write_zstd: 654ms -> 18.2s [+2667.43%, +2696.77%]
Further explanation regarding interpretation and methodology can be found in the documentation.

Bisaloo · 2025-12-11T15:12:30Z

I suspect the big time difference in writing may be due to the overhead of dealing with the connection. We may need to wait until r-devel/r-dev-day#89 is resolved.

github-actions · 2026-02-13T12:42:00Z

This is how benchmark results would change (along with a 95% confidence interval in relative change) if 8bd35fa is merged into devel:

❗🐌read_zstd: 440ms -> 481ms [+9.05%, +9.86%]
❗🐌write_zstd: 854ms -> 9.46s [+1000.79%, +1013.56%]
Further explanation regarding interpretation and methodology can be found in the documentation.

Bisaloo · 2026-02-14T17:50:02Z

It looks base R implementation only performs worse at medium & high compression level. I'm not sure why right now.

pkg	level	median	mem_alloc
Huber-group-EMBL/Rarr@devel	1	179.31ms	44784120
`8bd35fa`	1	149.02ms	31122970
Huber-group-EMBL/Rarr@devel	2	174.62ms	44453080
`8bd35fa`	2	152.05ms	31036272
Huber-group-EMBL/Rarr@devel	3	193.73ms	44459712
`8bd35fa`	3	190.83ms	31036272
Huber-group-EMBL/Rarr@devel	4	216.74ms	44365912
`8bd35fa`	4	230.94ms	31036272
Huber-group-EMBL/Rarr@devel	5	228.64ms	44349992
`8bd35fa`	5	277.76ms	31036272
Huber-group-EMBL/Rarr@devel	6	254.97ms	44324968
`8bd35fa`	6	288.65ms	31036272
Huber-group-EMBL/Rarr@devel	7	309.11ms	44325008
`8bd35fa`	7	367.66ms	31036272
Huber-group-EMBL/Rarr@devel	8	366.25ms	44328752
`8bd35fa`	8	371.95ms	31036272
Huber-group-EMBL/Rarr@devel	9	369.91ms	44360616
`8bd35fa`	9	529.52ms	31036272
Huber-group-EMBL/Rarr@devel	10	368.42ms	44360616
`8bd35fa`	10	991.71ms	31036272
Huber-group-EMBL/Rarr@devel	11	430.83ms	44114504
`8bd35fa`	11	1.02s	31036272
Huber-group-EMBL/Rarr@devel	12	498.78ms	44090632
`8bd35fa`	12	19.02s	31036272
Huber-group-EMBL/Rarr@devel	13	519.04ms	44092664
`8bd35fa`	13	15.53s	31036272
Huber-group-EMBL/Rarr@devel	14	519.16ms	44092664
`8bd35fa`	14	23.21s	31036272
Huber-group-EMBL/Rarr@devel	15	513.17ms	44092664
`8bd35fa`	15	31.11s	31036272
Huber-group-EMBL/Rarr@devel	16	822.34ms	44061848
`8bd35fa`	16	15.84s	31036272
Huber-group-EMBL/Rarr@devel	17	824.35ms	44061848
`8bd35fa`	17	23.65s	31036272
Huber-group-EMBL/Rarr@devel	18	825.18ms	44061848
`8bd35fa`	18	24.06s	31036272
Huber-group-EMBL/Rarr@devel	19	821.8ms	44061848
`8bd35fa`	19	39.83s	31036272

Benchmarking code:

bench_res <- cross::bench_versions(pkgs = c("Huber-group-EMBL/Rarr@devel", "Huber-group-EMBL/Rarr@8bd35fa188d0d24011375726f61b9e09f81ec44a"), {
  library(Rarr)
  x <- runif(1e6)
  arr <- array(x, dim = c(100, 100, 100))
  
  results <- bench::press(
    level = 1:19,
    bench::mark(write_zarr_array(arr, "zstd.zarr", chunk_dim = c(10, 10, 10), compressor = use_zstd(level = level)), iterations = 20)
  )
})

Bisaloo linked an issue Aug 26, 2025 that may be closed by this pull request

Investigate base R 4.5.0 zstd memDecompress() support #28

Open

Bisaloo mentioned this pull request Aug 26, 2025

Pass metadata$fill_value unmodified as a default in switch() #33

Merged

1 task

Bisaloo force-pushed the base-zstd branch from 25cde42 to fa81085 Compare August 26, 2025 15:42

Bisaloo force-pushed the base-zstd branch from 02e9669 to 97db3ca Compare August 26, 2025 16:03

Bisaloo force-pushed the base-zstd branch 3 times, most recently from 86ab946 to b7ed02c Compare September 26, 2025 09:23

Bisaloo force-pushed the base-zstd branch 2 times, most recently from 65ab6fb to 06c7f8f Compare October 1, 2025 16:54

Bisaloo force-pushed the base-zstd branch from c03b08c to 1604a41 Compare November 10, 2025 20:48

Bisaloo marked this pull request as ready for review November 10, 2025 20:50

Bisaloo closed this Nov 10, 2025

Bisaloo reopened this Nov 10, 2025

Bisaloo force-pushed the devel branch from 5a4de7d to 99b253a Compare November 13, 2025 10:20

Bisaloo added 2 commits February 13, 2026 12:58

Use base R for zstd (de)compression

2689778

Select relevant benchmarks

8bd35fa

Bisaloo force-pushed the base-zstd branch from 1604a41 to 8bd35fa Compare February 13, 2026 12:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use base implementation of zstd to compress/decompress data#32

Use base implementation of zstd to compress/decompress data#32
Bisaloo wants to merge 2 commits intodevelfrom
base-zstd

Bisaloo commented Aug 26, 2025

Uh oh!

Bisaloo commented Aug 26, 2025

Uh oh!

Bisaloo commented Aug 26, 2025

Uh oh!

Bisaloo commented Sep 26, 2025

Uh oh!

github-actions bot commented Oct 1, 2025

Uh oh!

github-actions bot commented Nov 10, 2025

Uh oh!

Bisaloo commented Dec 11, 2025

Uh oh!

github-actions bot commented Feb 13, 2026

Uh oh!

Bisaloo commented Feb 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Bisaloo commented Aug 26, 2025

Uh oh!

Bisaloo commented Aug 26, 2025

Uh oh!

Bisaloo commented Aug 26, 2025

Uh oh!

Bisaloo commented Sep 26, 2025

Uh oh!

github-actions bot commented Oct 1, 2025

Uh oh!

github-actions bot commented Nov 10, 2025

Uh oh!

Bisaloo commented Dec 11, 2025

Uh oh!

github-actions bot commented Feb 13, 2026

Uh oh!

Bisaloo commented Feb 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant