Skip to content

ad.concat is slow on lazy data on account of tokenize #1989

@ilan-gold

Description

@ilan-gold

Please make sure these conditions are met

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of anndata.
  • (optional) I have confirmed this bug exists on the master branch of anndata.

Report

Code:

import anndata

paths = []

adatas = [anndata.experimental.read_lazy(path) for path in paths]

anndata.concat(adatas) # adatas are all lazily loaded

Looking at a profiler output, this takes about half a second per new extension array column to be dask-ified which is too long

Versions

numpy   2.1.3
dask    2025.1.0
----    ----
hatch-vcs       0.4.0
prompt_toolkit  3.0.50
PyYAML  6.0.2
pluggy  1.5.0
jaraco.functools        4.0.1
tblib   3.0.0
h5py    3.13.0
asttokens       3.0.0
msgpack 1.1.0
natsort 8.4.0
decorator       5.1.1
psutil  7.0.0
parso   0.8.4
hatchling       1.27.0
distributed     2025.1.0
zstandard       0.23.0
tornado 6.4.2
ipython 8.32.0
jaraco.collections      5.1.0
zarr    3.0.6
numcodecs       0.15.1
wcwidth 0.2.13
scanpy  1.11.0
more-itertools  10.3.0
python-dateutil 2.9.0.post0
traitlets       5.14.3
pandas  2.2.3
executing       2.2.0
rich    13.9.4
wrapt   1.17.2
awkward 2.7.4
Deprecated      1.2.18
session-info2   0.1.2
scipy   1.15.2
pytz    2025.1
locket  1.0.0
setuptools      75.8.0
jedi    0.19.2
cloudpickle     3.1.1
zict    3.0.0
legacy-api-wrap 1.4.1
stack-data      0.6.3
typing_extensions       4.12.2
fsspec  2025.2.0
MarkupSafe      3.0.2
pathspec        0.12.1
donfig  0.8.1.post1
jaraco.context  5.3.0
pyarrow 19.0.1
awkward_cpp     44
crc32c  2.7.1
Jinja2  3.1.5
jaraco.text     3.12.1
click   8.1.8
pure_eval       0.2.3
toolz   1.0.0
setuptools-scm  8.1.0
packaging       24.2
charset-normalizer      3.4.1
xarray  2025.4.0
attrs   25.3.0
six     1.17.0
sortedcontainers        2.4.0
Pygments        2.19.1
----    ----
Python  3.12.6 (main, Sep  9 2024, 22:11:19) [Clang 18.1.8 ]
OS      Linux-5.15.0-134-generic-x86_64-with-glibc2.35
Updated 2025-05-15 14:12

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions