Skip to content

concat_on_disk with join='outer' doesn't retain all .obsm fields #2394

@alam-shahul

Description

@alam-shahul

Not sure if this is a bug or a missing feature, but ad.concat and ad.concat_on_disk seem to behave differently with join='outer' for .obsm fields that are not shared between all AnnData.

Please make sure these conditions are met

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of anndata.
  • (optional) I have confirmed this bug exists on the main branch of anndata.

Report

Code:

# /// script
# requires-python = ">=3.12"
# dependencies = [
#   "anndata@git+https://github.com/scverse/anndata.git",
# ]
# ///

import anndata
import numpy as np
a = ad.AnnData(
        X=np.ones((2, 10)),
        obsm={'X_emb': np.arange(4).astype(np.float64).reshape((2, 2))},
)
b = ad.AnnData(
        X=np.ones((4, 12)),
)

c = ad.concat([a, b], join="outer")
assert 'X_emb' in c.obsm.keys()

a.write_h5ad('a.h5ad')
b.write_h5ad('b.h5ad')

ad.experimental.concat_on_disk(['a.h5ad', 'b.h5ad'], out_file='d.h5ad', join="outer")

d = ad.read_h5ad('d.h5ad')
assert 'X_emb' in d.obsm.keys()

Versions

0.12.10

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions