Skip to content

Dataset2D from read_lazy fails when I try to call an unsupported column #2156

@selmanozleyen

Description

@selmanozleyen

Please make sure these conditions are met

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of anndata.
  • (optional) I have confirmed this bug exists on the main branch of anndata.

Report

Code:

import anndata as ad
import h5py
import zarr
from anndata.experimental import read_lazy
with h5py.File("/lustre/groups/ml01/workspace/100mil/100m_int_indices.h5ad", "r") as f:
    adata_all = ad.AnnData(
        obs=read_lazy(f["obs"]),
        var=read_lazy(f["var"]),
        uns=read_lazy(f["uns"]),
        obsm=read_lazy(f["obsm"]),
    )
adata_all.obs['cell_line'] # fails here (it is categorical data)

This is what it looks like normally

0           CVCL_0131
1           CVCL_0480
2           CVCL_0293
3           CVCL_0397
4           CVCL_1097
              ...    
95624329    CVCL_0504
95624330    CVCL_1693
95624331    CVCL_1381
95624332    CVCL_1285
95624333    CVCL_1550
Name: cell_line, Length: 95624334, dtype: category
Categories (50, object): ['CVCL_0023', 'CVCL_0028', 'CVCL_0069', 'CVCL_0099', ..., 'CVCL_1717', 'CVCL_1724', 'CVCL_1731', 'CVCL_C466']

Traceback:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
File /ictstr01/home/icb/selman.ozleyen/.local/share/mamba/envs/lpert/lib/python3.12/site-packages/IPython/core/formatters.py:770, in PlainTextFormatter.__call__(self, obj)
    763 stream = StringIO()
    764 printer = pretty.RepresentationPrinter(stream, self.verbose,
    765     self.max_width, self.newline,
    766     max_seq_length=self.max_seq_length,
    767     singleton_pprinters=self.singleton_printers,
    768     type_pprinters=self.type_printers,
    769     deferred_pprinters=self.deferred_printers)
--> 770 printer.pretty(obj)
    771 printer.flush()
    772 return stream.getvalue()

File /ictstr01/home/icb/selman.ozleyen/.local/share/mamba/envs/lpert/lib/python3.12/site-packages/IPython/lib/pretty.py:411, in RepresentationPrinter.pretty(self, obj)
    400                         return meth(obj, self, cycle)
    401                 if (
    402                     cls is not object
    403                     # check if cls defines __repr__
   (...)    409                     and callable(_safe_getattr(cls, "__repr__", None))
    410                 ):
--> 411                     return _repr_pprint(obj, self, cycle)
    413     return _default_pprint(obj, self, cycle)
    414 finally:
...

File h5py/h5d.pyx:399, in h5py.h5d.DatasetID.get_type()

ValueError: Invalid dataset identifier (identifier is not of specified type)
Error raised while reading key '??' of <class 'h5py._hl.dataset.Dataset'> from /

Versions

| Package | Version                 |
| ------- | ----------------------- |
| xarray  | 2025.9.0                |
| anndata | 0.13.0.dev28+g1ba19458f |
| h5py    | 3.14.0                  |
| zarr    | 3.1.2                   |
| pandas  | 2.3.2                   |

| Dependency         | Version               |
| ------------------ | --------------------- |
| packaging          | 25.0                  |
| debugpy            | 1.8.12                |
| msgpack            | 1.1.1                 |
| cupy-cuda12x       | 13.6.0                |
| urllib3            | 2.5.0                 |
| setuptools         | 80.9.0                |
| click              | 8.2.1                 |
| Pygments           | 2.19.2                |
| legacy-api-wrap    | 1.4.1                 |
| cloudpickle        | 3.1.1                 |
| donfig             | 0.8.1.post1           |
| executing          | 2.2.1                 |
| certifi            | 2025.8.3 (2025.08.03) |
| jupyter_core       | 5.8.1                 |
| ipywidgets         | 8.1.7                 |
| tqdm               | 4.67.1                |
| jedi               | 0.19.2                |
| ipykernel          | 6.29.5                |
| crc32c             | 2.7.1                 |
| MarkupSafe         | 3.0.2                 |
| attrs              | 25.3.0                |
| stack-data         | 0.6.3                 |
| fastrlock          | 0.8.3                 |
| jupyter_client     | 8.6.3                 |
| wcwidth            | 0.2.13                |
| prompt_toolkit     | 3.0.52                |
| parso              | 0.8.5                 |
| six                | 1.17.0                |
| sortedcontainers   | 2.4.0                 |
| idna               | 3.10                  |
| Jinja2             | 3.1.6                 |
| requests           | 2.32.5                |
| simplejson         | 3.20.1                |
| omegaconf          | 2.3.0                 |
| scipy              | 1.15.3                |
| pyarrow            | 21.0.0                |
| pure_eval          | 0.2.3                 |
| tblib              | 3.1.0                 |
| fsspec             | 2025.9.0              |
| pytz               | 2025.2                |
| toolz              | 1.0.0                 |
| comm               | 0.2.2                 |
| numpy              | 2.2.6                 |
| ipython            | 9.5.0                 |
| typing_extensions  | 4.15.0                |
| torch              | 2.8.0                 |
| locket             | 1.0.0                 |
| numcodecs          | 0.16.2                |
| asttokens          | 3.0.0                 |
| natsort            | 8.4.0                 |
| dask               | 2025.9.1              |
| distributed        | 2025.9.1              |
| scanpy             | 1.11.4                |
| python-dateutil    | 2.9.0.post0           |
| psutil             | 7.0.0                 |
| PyYAML             | 6.0.2                 |
| platformdirs       | 4.4.0                 |
| rich               | 14.1.0                |
| traitlets          | 5.14.3                |
| tornado            | 6.5.2                 |
| session-info2      | 0.2.1                 |
| zict               | 3.0.0                 |
| pyzmq              | 27.1.0                |
| decorator          | 5.2.1                 |
| charset-normalizer | 3.4.3                 |

| Component | Info                                                                           |
| --------- | ------------------------------------------------------------------------------ |
| Python    | 3.12.11 | packaged by conda-forge | (main, Jun  4 2025, 14:45:31) [GCC 13.3.0] |
| OS        | Linux-5.14.0-570.25.1.el9_6.x86_64-x86_64-with-glibc2.34                       |
| Updated   | 2025-10-16 13:10                                                              

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions