Skip to content
Open
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion src/anndata/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
WriteWarning,
)
from .io import read_h5ad, read_zarr
from .utils import module_get_attr_redirect
from .utils import adapt_vars_like, module_get_attr_redirect

# Submodules need to be imported last
from . import abc, experimental, typing, io, types # isort: skip
Expand Down Expand Up @@ -53,6 +53,7 @@ def __getattr__(attr_name: str) -> Any:
"WriteWarning",
"__version__",
"abc",
"adapt_vars_like",
"concat",
"experimental",
"io",
Expand Down
45 changes: 45 additions & 0 deletions src/anndata/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -450,3 +450,48 @@
return getattr(mod, new_path)
msg = f"module {full_old_module_path} has no attribute {attr_name!r}"
raise AttributeError(msg)


def adapt_vars_like(
source: AnnData, target: AnnData, fill_value: float = 0.0
) -> AnnData:
# source = AnnData object that defines the desired genes
# target = the data you want to reshape to match source
# fill_vlaue = what value to use for missing genes (default set to 0.0)
# returns a new AnnData object with the same genes as source
"""
Make target have the same .var (genes) as source., missing genes are filled with fill_value.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try copying the format here of other docstrings and add this to the public API i.e., in docs/api.md. The CI job (or locally you) can then check the doc rendering

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the docstring format, but I’m still a bit unsure about how api.md is structured. I did add a section for the function, but I’d really appreciate it if you could take a quick look and let me know if anything needs adjusting.

# importing here to avoid circular import issues
from ._core.anndata import AnnData

Check warning on line 466 in src/anndata/utils.py

View check run for this annotation

Codecov / codecov/patch

src/anndata/utils.py#L466

Added line #L466 was not covered by tests

# needed to add it as when trying to call target.X[:, target.var.index]
# it would raise an error if target.X is None
if target.X is None:
msg = "target.X is None; cannot adapt vars without a data matrix."
raise ValueError(msg)

Check warning on line 472 in src/anndata/utils.py

View check run for this annotation

Codecov / codecov/patch

src/anndata/utils.py#L470-L472

Added lines #L470 - L472 were not covered by tests
# this will define the gene list we want to match
new_var = source.var.copy()

Check warning on line 474 in src/anndata/utils.py

View check run for this annotation

Codecov / codecov/patch

src/anndata/utils.py#L474

Added line #L474 was not covered by tests
# initializing a new dense np array of shape (number of target cells, number of genes in source)
# filled with fill_value
# this will become the new .X matrix.
# It makes sure all genes in source are represented, and placeholders are ready for copying shared ones
new_x = np.full((target.n_obs, new_var.shape[0]), fill_value, dtype=target.X.dtype)

Check warning on line 479 in src/anndata/utils.py

View check run for this annotation

Codecov / codecov/patch

src/anndata/utils.py#L479

Added line #L479 was not covered by tests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can use the Reindexer class we have in src/anndata/_core/merge.py to handle the reindexing logic. You'll just need to pass in the old/new indices

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will allow us to handle different array types and dataframes. Check out in that class how many different arrays there are, it's non-trivial definitely!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did end up switching, but maybe it is a bit too general now. Could you possibly review that as well?

# finds gene names that appeare in both source and target
shared_genes = source.var_names.intersection(target.var_names)

Check warning on line 481 in src/anndata/utils.py

View check run for this annotation

Codecov / codecov/patch

src/anndata/utils.py#L481

Added line #L481 was not covered by tests
# positions of shared genes in source
source_idx = new_var.index.get_indexer(shared_genes)

Check warning on line 483 in src/anndata/utils.py

View check run for this annotation

Codecov / codecov/patch

src/anndata/utils.py#L483

Added line #L483 was not covered by tests
# positions of those same genes in target
target_idx = target.var.index.get_indexer(shared_genes)

Check warning on line 485 in src/anndata/utils.py

View check run for this annotation

Codecov / codecov/patch

src/anndata/utils.py#L485

Added line #L485 was not covered by tests
# fills the new .X array for all target cells (rows)
# also inserts expression values from target.X into the correct columns of new_x
# for the shared genes
# only genes in both source and target are copied over.
# everything else remains at fill_value
new_x[:, source_idx] = target.X[:, target_idx]

Check warning on line 491 in src/anndata/utils.py

View check run for this annotation

Codecov / codecov/patch

src/anndata/utils.py#L491

Added line #L491 was not covered by tests
# creates a new AnnData object with the new .X and .var
# .X is the filled new_x array
# .obs is a copy of the target.obs
# .var is copied from source.var, making sure alignment of gene annotations
new_adata = AnnData(X=new_x, obs=target.obs.copy(), var=new_var)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll want to do the whole AnnData object. So you'll need to use the ReIndexer (which operates on all types of matrices, dataframes etc) on all the parts of the object, I think, something like

reindexer = Reindexer(new_var.index, target.var.index)

AnnData(X=reindexer(target.X, fill_value=fill_value), obs=reindexer(target.obs, fill_value=fill_value), obsm={k: reindexer(v, fill_value=fill_value) for k, v in obsm.items()}...)

and so forth. Does that make sense?

return new_adata

Check warning on line 497 in src/anndata/utils.py

View check run for this annotation

Codecov / codecov/patch

src/anndata/utils.py#L496-L497

Added lines #L496 - L497 were not covered by tests
Loading