(fix): cache partial decoder #93

ilan-gold · 2025-04-09T10:14:11Z

Fixes #90 (hopefully) - I benchmarked and performance (at least relatively speaking) did not go down, but also did not go up, although our benchmarks are probably not really checking for this behavior.

I went with HashMap because I couldn't see a reason to use BTreeMap (I see it's used elsewhere and that people in rust seem to use it as a drop-in replacement for HashMap because it's faster sometimes? Not really familiar with this TBH).

I am also not sure the key is the right thing to hash on - the "key" here would be a shard, and we want to share the decoder per shard, right?

LDeakin · 2025-04-09T12:22:37Z

the "key" here would be a shard, and we want to share the decoder per shard, right?

Yep! The partial decoder for the sharding codec retrieves and decodes the shard index only once.

However, the partial decoder cache should only live as long as the batch of requests in retrieve_chunks_and_apply_index. Otherwise, it would perpetually grow and become stale after subsequent writes. I should document this better, but partial decoder initialisation can perform store and codec operations (and may even decode entire chunks with certain codecs!).

So my recommendation is to avoid locks and:

Identify the unique chunks and create partial decoders for each in parallel.
Run partial_decode_into in parallel for each chunk subset

…into ig/reuse_decoder

ilan-gold · 2025-04-09T14:28:57Z

Separately there seems to be an error about typesize which I would guess is somehow related to my work on that zarr-developers/numcodecs#713

ilan-gold · 2025-04-09T14:33:40Z

Yup, https://github.com/zarr-developers/numcodecs/blob/58bc9c4da74e06fbec985d4235f912b31d4d31f6/numcodecs/abc.py#L79-L95 means it is now part of the configuration. So we can probably ignore it, will open a separate PR

ilan-gold · 2025-04-09T15:51:51Z

Seems like I did 1. by constructing a HashMap of the keys that require partial mapped to their WithSubset and then parallel-iterate to create tuples that have the key and the decoder before updating the HashMap representing the mapping of key to decoder. 2. Should already be handled

ilan-gold · 2025-04-09T15:59:55Z

src/lib.rs

+        let mut item_map: HashMap<StoreKey, &WithSubset> = HashMap::new().into();
+        chunk_descriptions
+            .iter()
+            .filter(|item| !(is_whole_chunk(item)))
+            .for_each(|item| {
+                item_map.insert(item.key().clone(), item);
+            });


I think there is a cleaner way to do this...

LDeakin

Nice! Just a minor nit on Option -> Result + ensuring that the chunk concurrent limit is used.

src/lib.rs

Co-authored-by: Lachlan Deakin <[email protected]>

ilan-gold added 2 commits April 9, 2025 11:35

(fix): cache partial decoder

e284993

Merge branch 'main' into ig/reuse_decoder

60d3f48

ilan-gold added 2 commits April 9, 2025 16:24

(refactor): local caching

452741b

Merge branch 'ig/reuse_decoder' of github.com:ilan-gold/zarrs-python …

d6f1a2b

…into ig/reuse_decoder

ilan-gold requested a review from LDeakin April 9, 2025 15:52

ilan-gold commented Apr 9, 2025

View reviewed changes

ilan-gold added 2 commits April 9, 2025 18:06

(refactor): only use one hashmap

24478f7

(refactor): only do unique on filtered object

8fb36e9

LDeakin approved these changes Apr 9, 2025

View reviewed changes

src/lib.rs Outdated Show resolved Hide resolved

src/lib.rs Outdated Show resolved Hide resolved

src/lib.rs Outdated Show resolved Hide resolved

ilan-gold and others added 4 commits April 10, 2025 09:37

Merge branch 'main' into ig/reuse_decoder

3929d8e

Update src/lib.rs

226ff73

Co-authored-by: Lachlan Deakin <[email protected]>

Update src/lib.rs

a40104d

Co-authored-by: Lachlan Deakin <[email protected]>

Update src/lib.rs

c6af851

Co-authored-by: Lachlan Deakin <[email protected]>

ilan-gold enabled auto-merge (squash) April 10, 2025 07:39

ilan-gold merged commit dabb30c into main Apr 10, 2025
13 of 14 checks passed

ilan-gold deleted the ig/reuse_decoder branch April 10, 2025 07:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

(fix): cache partial decoder #93

(fix): cache partial decoder #93

Uh oh!

ilan-gold commented Apr 9, 2025

Uh oh!

LDeakin commented Apr 9, 2025

Uh oh!

ilan-gold commented Apr 9, 2025

Uh oh!

ilan-gold commented Apr 9, 2025

Uh oh!

ilan-gold commented Apr 9, 2025

Uh oh!

ilan-gold Apr 9, 2025

Uh oh!

LDeakin left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

(fix): cache partial decoder #93

(fix): cache partial decoder #93

Uh oh!

Conversation

ilan-gold commented Apr 9, 2025

Uh oh!

LDeakin commented Apr 9, 2025

Uh oh!

ilan-gold commented Apr 9, 2025

Uh oh!

ilan-gold commented Apr 9, 2025

Uh oh!

ilan-gold commented Apr 9, 2025

Uh oh!

ilan-gold Apr 9, 2025

Choose a reason for hiding this comment

Uh oh!

LDeakin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!