Skip to content

write_zarr fails with LocalCUDACluster (dask-cuda) when adata.X has been persisted #2444

@Intron7

Description

@Intron7

Please make sure these conditions are met

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of anndata.
  • (optional) I have confirmed this bug exists on the main branch of anndata.

Report

Code:

from dask_cuda import LocalCUDACluster
from dask.distributed import Client
import dask.array as da
import numpy as np
import zarr

cluster = LocalCUDACluster()
client = Client(cluster)

x = da.from_array(np.ones((10000, 200)), chunks=(1000, 200))
x = x.map_blocks(lambda b: b + 1).persist()

g = zarr.open("/tmp/test.zarr", mode="w", shape=x.shape, dtype=x.dtype, chunks=(1000, 200))
da.store(x, g, scheduler="threads")  # ValueError: Missing dependency ('lambda-<hash>', i, 0)

Traceback:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[3], line 14
     10 x = da.from_array(np.ones((10000, 200)), chunks=(1000, 200))
     11 x = x.map_blocks(lambda b: b + 1).persist()
     12 
     13 g = zarr.open("[/tmp/test.zarr](http://981afa3-lcedt.dyn.nvidia.com:8888/tmp/test.zarr)", mode="w", shape=x.shape, dtype=x.dtype, chunks=(1000, 200))
---> 14 da.store(x, g, scheduler="threads")  # Missing dependency

File [~/micromamba/envs/rapids-26.04/lib/python3.14/site-packages/dask/array/core.py:1218](http://981afa3-lcedt.dyn.nvidia.com:8888/home/sdicks/micromamba/envs/rapids-26.04/lib/python3.14/site-packages/dask/array/core.py#line=1217), in store(***failed resolving arguments***)
   1215 if not return_stored:
   1216     import dask
-> 1218     dask.compute(arrays, **kwargs)
   1219     return None
   1220 else:

File [~/micromamba/envs/rapids-26.04/lib/python3.14/site-packages/dask/base.py:685](http://981afa3-lcedt.dyn.nvidia.com:8888/home/sdicks/micromamba/envs/rapids-26.04/lib/python3.14/site-packages/dask/base.py#line=684), in compute(traverse, optimize_graph, scheduler, get, *args, **kwargs)
    682     expr = expr.optimize()
    683     keys = list(flatten(expr.__dask_keys__()))
--> 685     results = schedule(expr, keys, **kwargs)
    687 return repack(results)

File [~/micromamba/envs/rapids-26.04/lib/python3.14/site-packages/dask/local.py:191](http://981afa3-lcedt.dyn.nvidia.com:8888/home/sdicks/micromamba/envs/rapids-26.04/lib/python3.14/site-packages/dask/local.py#line=190), in start_state_from_dask(dsk, cache, sortkey, keys)
    189 if task is None:
    190     if dependents[key] and not cache.get(key):
--> 191         raise ValueError(
    192             f"Missing dependency {key} for dependents {dependents[key]}"
    193         )
    194     continue
    195 elif isinstance(task, DataNode):

ValueError: Missing dependency ('lambda-e88bbde22afa74e5c4a58733a1fb745d', 4, 0) for dependents {('store-map-680948d7b57b14eb187a955c49f9a516', 4, 0)}

Click to add a cell.

Versions

adata.write_zarr(...) raises ValueError: Missing dependency ... when a dask_cuda.LocalCUDACluster client is active and adata.X is a persisted dask array. The trigger is the hardcoded scheduler="threads" in anndata's writers — it can't resolve Futures held by the dask-cuda cluster's workers.
With a LocalCUDACluster active, .persist() materializes the array's tasks as Future objects on the dask-cuda workers. The local threaded scheduler doesn't have visibility into those Futures and reports them as missing dependencies in dask/local.py::start_state_from_dask.
I'll also file a bug with dask-cuda. However it might be worth looking into our writing functions if hardcoding scheduler="threads" is needed

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions