Skip to content

Use base implementation of zstd to compress/decompress data#32

Open
Bisaloo wants to merge 2 commits intodevelfrom
base-zstd
Open

Use base implementation of zstd to compress/decompress data#32
Bisaloo wants to merge 2 commits intodevelfrom
base-zstd

Conversation

@Bisaloo
Copy link
Member

@Bisaloo Bisaloo commented Aug 26, 2025

No description provided.

@Bisaloo
Copy link
Member Author

Bisaloo commented Aug 26, 2025

Couple of observations:

  • we still need the bundled zstd library as it is used by blosc
  • ?memCompress() says (emphasis mine):

    zstd compression was introduced in R 4.5.0: it is an optional part of the R build and currently uses compression level 3 which gives a good compression ratio vs compression speed trade-off.

@Bisaloo
Copy link
Member Author

Bisaloo commented Aug 26, 2025

@Bisaloo Bisaloo force-pushed the base-zstd branch 3 times, most recently from 86ab946 to b7ed02c Compare September 26, 2025 09:23
@Bisaloo
Copy link
Member Author

Bisaloo commented Sep 26, 2025

base implementation only allows compression level up to 19

@Bisaloo Bisaloo force-pushed the base-zstd branch 2 times, most recently from 65ab6fb to 06c7f8f Compare October 1, 2025 16:54
@github-actions
Copy link

github-actions bot commented Oct 1, 2025

This is how benchmark results would change (along with a 95% confidence interval in relative change) if 06c7f8f is merged into devel:

  • ❗🐌read_zstd: 447ms -> 682ms [+51.95%, +53.17%]
  • ❗🐌write_zstd: 731ms -> 12s [+1532.48%, +1553.16%]
    Further explanation regarding interpretation and methodology can be found in the documentation.

@Bisaloo Bisaloo marked this pull request as ready for review November 10, 2025 20:50
@Bisaloo Bisaloo closed this Nov 10, 2025
@Bisaloo Bisaloo reopened this Nov 10, 2025
@github-actions
Copy link

This is how benchmark results would change (along with a 95% confidence interval in relative change) if 1604a41 is merged into devel:

  • ✔️read_boolean: 10.1ms -> 10.1ms [-1%, +0.51%]
  • ✔️read_double: 20.6ms -> 20.6ms [-0.72%, +0.29%]
  • ✔️read_float16: 13.4ms -> 13.3ms [-2.66%, +1.68%]
  • ✔️read_float32: 13.3ms -> 13.3ms [-1.01%, +1.05%]
  • ✔️read_int16: 13.2ms -> 13.2ms [-0.1%, +1.11%]
  • ✔️read_int32: 20.5ms -> 20.5ms [-0.98%, +0.49%]
  • ✔️read_int64: 13.4ms -> 13.4ms [-0.53%, +1.17%]
  • ✔️read_int8: 13.3ms -> 13.2ms [-1.27%, +0.14%]
  • ✔️read_uint32: 13.9ms -> 14ms [-0.11%, +1.3%]
  • ✔️read_uint64: 13.5ms -> 13.5ms [-1.15%, +0.2%]
  • ✔️read_unicode: 8.58ms -> 8.62ms [-0.39%, +1.35%]
  • ❗🐌read_zstd: 371ms -> 386ms [+3.45%, +4.51%]
  • ✔️write_double: 18.8ms -> 18.8ms [-2.36%, +2.06%]
  • ✔️write_int32: 18.7ms -> 18.7ms [-1.76%, +1.9%]
  • ✔️write_string: 10.6ms -> 10.5ms [-0.9%, +0.78%]
  • ❗🐌write_zstd: 654ms -> 18.2s [+2667.43%, +2696.77%]
    Further explanation regarding interpretation and methodology can be found in the documentation.

@Bisaloo
Copy link
Member Author

Bisaloo commented Dec 11, 2025

I suspect the big time difference in writing may be due to the overhead of dealing with the connection. We may need to wait until r-devel/r-dev-day#89 is resolved.

@github-actions
Copy link

This is how benchmark results would change (along with a 95% confidence interval in relative change) if 8bd35fa is merged into devel:

  • ❗🐌read_zstd: 440ms -> 481ms [+9.05%, +9.86%]
  • ❗🐌write_zstd: 854ms -> 9.46s [+1000.79%, +1013.56%]
    Further explanation regarding interpretation and methodology can be found in the documentation.

@Bisaloo
Copy link
Member Author

Bisaloo commented Feb 14, 2026

It looks base R implementation only performs worse at medium & high compression level. I'm not sure why right now.

pkg level median mem_alloc
Huber-group-EMBL/Rarr@devel 1 179.31ms 44784120
8bd35fa 1 149.02ms 31122970
Huber-group-EMBL/Rarr@devel 2 174.62ms 44453080
8bd35fa 2 152.05ms 31036272
Huber-group-EMBL/Rarr@devel 3 193.73ms 44459712
8bd35fa 3 190.83ms 31036272
Huber-group-EMBL/Rarr@devel 4 216.74ms 44365912
8bd35fa 4 230.94ms 31036272
Huber-group-EMBL/Rarr@devel 5 228.64ms 44349992
8bd35fa 5 277.76ms 31036272
Huber-group-EMBL/Rarr@devel 6 254.97ms 44324968
8bd35fa 6 288.65ms 31036272
Huber-group-EMBL/Rarr@devel 7 309.11ms 44325008
8bd35fa 7 367.66ms 31036272
Huber-group-EMBL/Rarr@devel 8 366.25ms 44328752
8bd35fa 8 371.95ms 31036272
Huber-group-EMBL/Rarr@devel 9 369.91ms 44360616
8bd35fa 9 529.52ms 31036272
Huber-group-EMBL/Rarr@devel 10 368.42ms 44360616
8bd35fa 10 991.71ms 31036272
Huber-group-EMBL/Rarr@devel 11 430.83ms 44114504
8bd35fa 11 1.02s 31036272
Huber-group-EMBL/Rarr@devel 12 498.78ms 44090632
8bd35fa 12 19.02s 31036272
Huber-group-EMBL/Rarr@devel 13 519.04ms 44092664
8bd35fa 13 15.53s 31036272
Huber-group-EMBL/Rarr@devel 14 519.16ms 44092664
8bd35fa 14 23.21s 31036272
Huber-group-EMBL/Rarr@devel 15 513.17ms 44092664
8bd35fa 15 31.11s 31036272
Huber-group-EMBL/Rarr@devel 16 822.34ms 44061848
8bd35fa 16 15.84s 31036272
Huber-group-EMBL/Rarr@devel 17 824.35ms 44061848
8bd35fa 17 23.65s 31036272
Huber-group-EMBL/Rarr@devel 18 825.18ms 44061848
8bd35fa 18 24.06s 31036272
Huber-group-EMBL/Rarr@devel 19 821.8ms 44061848
8bd35fa 19 39.83s 31036272

Benchmarking code:

bench_res <- cross::bench_versions(pkgs = c("Huber-group-EMBL/Rarr@devel", "Huber-group-EMBL/Rarr@8bd35fa188d0d24011375726f61b9e09f81ec44a"), {
  library(Rarr)
  x <- runif(1e6)
  arr <- array(x, dim = c(100, 100, 100))
  
  results <- bench::press(
    level = 1:19,
    bench::mark(write_zarr_array(arr, "zstd.zarr", chunk_dim = c(10, 10, 10), compressor = use_zstd(level = level)), iterations = 20)
  )
})

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Investigate base R 4.5.0 zstd memDecompress() support

1 participant