Understand Sparse vs. Dense Tradeoff

To evaluate this we should account for multiple factors:

## Compression

- [ ] Is there a point at which the dense compression (like blosc) handling repeated 0's overtakes the sparse compression in terms of either on-disk size (i.e., smaller) or read-throughput, especially given that we can't block-aligh sparse due to the file format?  
- [ ] Is the same compression method (like blosc) that is good for dense also good for the sparse data itself (beyond the fact that sparse data is already "compressed").

## Sharding

- [ ] What is the impact of using very tiny shards? i.e., https://github.com/zarrs/zarr_benchmarks?tab=readme-ov-file#standalone-2 shows that reading tiny shards is more performant than chunks.
- [ ] Does having these tiny shards be block aligned outperform sparse, even if sparse has tiny shards? 
- [ ] Do we need to use https://zarrs-python.readthedocs.io/en/stable/ and at what point do you need to use it to get good performance with sharding?

## Denseification

- [ ] What is the impact of dense-ification as an operation on this tradeoff? We can't rely on using sparse matrices as input to models and thus need to dense-ify. 
- [ ] Does GPU denseification help with this? (definitely yes, but should understand better, see next point)
- [ ] Where is the best time to denseify, batch-by-batch or within the prefetching i.e., is denseifying large quantities of data at once better than dense-ifying small quantities of data repeatedly?

There will likely be interplay along all of these axes within the context of this sparse/dense tradeoff.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Understand Sparse vs. Dense Tradeoff #2

Compression

Sharding

Denseification

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Understand Sparse vs. Dense Tradeoff #2

Description

Compression

Sharding

Denseification

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions