-
Notifications
You must be signed in to change notification settings - Fork 175
Open
Labels
Description
Question
Hey all,
I have a few anndata datasets with sparse csr X matrices (each is with ~10M cells and 40K genes, with parity of about 5%).
I want to be able to quickly load whole rows from these datasets (say given a query, load all rows based on a condition on the obs table).
Currently I am taking the anndata object and converting it to tileDB, but I recently encountered the zarr file format, and specifically the support of zarr v3 in anndata.
I have a few questions regarding zarr:
- Is Zarr v3 would be a good fit for our use case? Should I expect improvement over tileDB?
- Are there some guidelines on what codec to use? Chunk sizes?
- Are there some guidelines as to how to benefit from concurrency? I see dask being used in many places together with zarr.
Thanks!