Skip to content

concat_on_disk merge strategies are untested/not implemented #1505

@ilan-gold

Description

@ilan-gold

Please describe your wishes and possible alternatives to achieve the desired result.

We should implement them so that they work properly. I am not really sure if this is a bug since concat_on_disk is experimental and reading through the old PR, I don't see any discussion of it or any tests for it.

Here's an MVCE for first with default arguments, although by adding a merge_type argument to the current test suite for concat_on_disk you can see a full list of problems:

from anndata.tests.helpers import (
    assert_equal,
    gen_adata,
)
import anndata as ad
import numpy as np
from scipy import sparse
import pandas as pd

GEN_ADATA_OOC_CONCAT_ARGS = dict(
    obsm_types=(
        sparse.csr_matrix,
        np.ndarray,
        pd.DataFrame,
    ),
    varm_types=(sparse.csr_matrix, np.ndarray, pd.DataFrame),
    layers_types=(sparse.spmatrix, np.ndarray, pd.DataFrame),
)

adata_1 = gen_adata((100, 200), **GEN_ADATA_OOC_CONCAT_ARGS)
adata_2 = gen_adata((50, 60), **GEN_ADATA_OOC_CONCAT_ARGS)
adata_1.write_h5ad('test_1.h5ad')
adata_2.write_h5ad('test_2.h5ad')
ad.experimental.concat_on_disk(['test_1.h5ad', 'test_2.h5ad'], 'merged.h5ad', merge="first")

raises:

IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'h5py._hl.group.Group'>
Error raised while writing key 'var_cat' of <class 'h5py._hl.group.Group'> to /var

Here's a full list of the tests that fail from test_anndatas_with_reindex when merge is tested:

Errors + tests
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-array-inner-zarr-10-unique] - AssertionError: Error raised from element 'varp'.
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-array-inner-h5ad-10-same] - AssertionError: Error raised from element 'varp'.
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-array-inner-h5ad-10-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'h5py._hl.group.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-array-inner-zarr-10-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'zarr.hierarchy.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-array-inner-h5ad-100000000-unique] - AssertionError: Error raised from element 'varp'.
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-array-inner-h5ad-100000000-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'h5py._hl.group.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-array-outer-h5ad-100000000-first] - TypeError: Can't implicitly convert non-string objects to strings
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-array-inner-zarr-100000000-same] - AssertionError: Error raised from element 'varp'.
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-sparse-outer-h5ad-10-first] - TypeError: Can't implicitly convert non-string objects to strings
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-sparse-outer-h5ad-100000000-first] - TypeError: Can't implicitly convert non-string objects to strings
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-array-inner-zarr-100000000-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'zarr.hierarchy.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-sparse_array-inner-zarr-100000000-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'zarr.hierarchy.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-array-inner-zarr-10-same] - AssertionError: Error raised from element 'varp'.
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-sparse_array-inner-h5ad-100000000-unique] - AssertionError: DataFrame are different
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-sparse_array-inner-h5ad-100000000-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'h5py._hl.group.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-array-outer-zarr-100000000-first] - TypeError: expected unicode string, found True
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-array-outer-zarr-10-first] - TypeError: expected unicode string, found True
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-sparse-inner-zarr-100000000-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'zarr.hierarchy.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-sparse_array-inner-h5ad-10-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'h5py._hl.group.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-sparse_array-inner-h5ad-100000000-same] - AssertionError: Error raised from element 'varp'.
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-array-outer-h5ad-10-first] - TypeError: Can't implicitly convert non-string objects to strings
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-sparse_array-outer-h5ad-10-first] - TypeError: Can't implicitly convert non-string objects to strings
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-sparse-inner-h5ad-10-same] - AssertionError: Error raised from element 'varp'.
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-sparse_array-outer-h5ad-100000000-first] - TypeError: Can't implicitly convert non-string objects to strings
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-sparse-inner-h5ad-10-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'h5py._hl.group.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-sparse-inner-h5ad-100000000-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'h5py._hl.group.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-array-inner-h5ad-10-same] - AssertionError: Error raised from element 'obsp'.
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-array-inner-h5ad-10-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'h5py._hl.group.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-array-inner-h5ad-100000000-same] - AssertionError: Error raised from element 'obsp'.
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-array-inner-h5ad-100000000-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'h5py._hl.group.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-array-inner-zarr-100000000-same] - AssertionError: Error raised from element 'obsp'.
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-sparse-inner-zarr-10-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'zarr.hierarchy.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-sparse_array-outer-zarr-100000000-first] - TypeError: expected unicode string, found True
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-array-inner-zarr-10-unique] - AssertionError: Error raised from element 'obsp'.
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-array-inner-zarr-100000000-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'zarr.hierarchy.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-array-inner-zarr-10-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'zarr.hierarchy.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-array-outer-h5ad-10-first] - TypeError: Can't implicitly convert non-string objects to strings
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-array-outer-h5ad-100000000-first] - TypeError: Can't implicitly convert non-string objects to strings
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-array-outer-zarr-100000000-first] - TypeError: expected unicode string, found True
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-sparse_array-outer-zarr-10-first] - TypeError: expected unicode string, found True
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-array-outer-zarr-10-first] - TypeError: expected unicode string, found True
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-sparse-inner-zarr-10-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'zarr.hierarchy.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-sparse-inner-zarr-100000000-unique] - AssertionError: Error raised from element 'obsp'.
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-sparse-inner-h5ad-10-same] - AssertionError: Error raised from element 'obsp'.
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-sparse-inner-h5ad-10-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'h5py._hl.group.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-sparse-inner-zarr-100000000-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'zarr.hierarchy.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-sparse-inner-h5ad-100000000-same] - AssertionError: Error raised from element 'obsp'.
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-sparse-outer-zarr-10-first] - TypeError: expected unicode string, found True
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-sparse-inner-h5ad-100000000-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'h5py._hl.group.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-sparse_array-inner-zarr-10-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'zarr.hierarchy.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-sparse-inner-zarr-100000000-unique] - AssertionError: Error raised from element 'varp'.
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-sparse-outer-zarr-100000000-first] - TypeError: expected unicode string, found True
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-sparse-outer-h5ad-10-first] - TypeError: Can't implicitly convert non-string objects to strings
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-sparse-outer-h5ad-100000000-first] - TypeError: Can't implicitly convert non-string objects to strings
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-sparse-inner-zarr-10-unique] - AssertionError: Error raised from element 'obsp'.
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-sparse_array-inner-zarr-100000000-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'zarr.hierarchy.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[0-sparse-outer-zarr-100000000-first] - TypeError: expected unicode string, found True
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-sparse_array-inner-h5ad-10-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'h5py._hl.group.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-sparse_array-inner-h5ad-100000000-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'h5py._hl.group.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-sparse_array-inner-h5ad-100000000-unique] - AssertionError: Error raised from element 'obsp'.
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-sparse-outer-zarr-10-first] - TypeError: expected unicode string, found True
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-sparse_array-outer-zarr-10-first] - TypeError: expected unicode string, found True
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-sparse_array-inner-zarr-10-first] - anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'pandas.core.series.Series'> into <class 'zarr.hierarchy.Group'>
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-sparse_array-outer-h5ad-10-first] - TypeError: Can't implicitly convert non-string objects to strings
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-sparse_array-outer-h5ad-100000000-first] - TypeError: Can't implicitly convert non-string objects to strings
FAILED tests/test_concatenate_disk.py::test_anndatas_with_reindex[1-sparse_array-outer-zarr-100000000-first] - TypeError: expected unicode string, found True
Details

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions