Backend-native Implementation #2071

amalia-k510 · 2025-08-11T06:59:55Z

First step in getting anndata concat and test generation to work properly with JAX, (and Cubed potentially), without just converting everything into NumPy.

Random data creation and shape handling use xp.asarray so arrays stay in their original backend where possible. I also updated concat paths to actually check types before converting, added helpers for sparse detection and array API checks, and made sure backend arrays only get turned into NumPy when absolutely necessary. This fixes a bunch of concat-related test failures.

It’s still not perfect. Some pandas calls in concat still force conversion to NumPy, so the data gets copied instead of being used directly. Cubed support is only a placeholder right now. Type detection might still be a bit too broad, which can lead to extra conversions. Works for NumPy and JAX in tests, but I haven’t tried other backends.

…at_func0-False]

…a into ig/array_api_continue import merge

flying-sheep

OK, I just went over general code style, nothing JAX-related

flying-sheep · 2025-08-11T12:40:29Z

src/anndata/_core/merge.py

+        # Force to NumPy (materializes JAX/Cubed); fine for small tests,
+        # but may be slow or fail on large/lazy arrays


This code doesn’t just run for tests though. Also are you sure that this is a good idea for arrays with pandas dtypes?

Yeah, I was initially forcing everything to NumPy, but that’s no longer the case. I’ve updated it so the it should preserve arrays with pandas dtypes.

src/anndata/_core/merge.py

flying-sheep · 2025-08-11T12:45:36Z

src/anndata/_core/merge.py

+        return False
+
+
+def _to_numpy_if_array_api(x):


there should be no second copy of this that’s slightly different, only one!

src/anndata/_core/merge.py

src/anndata/tests/helpers.py

tests/test_helpers.py

ilan-gold · 2025-08-19T15:31:53Z

src/anndata/_core/anndata.py

+                dest = self._adata_ref._X
+                # Handles read-only NumPy views from backend arrays like JAX by
+                # making a writable copy so in-place assignment on views can succeed.
+                if isinstance(dest, np.ndarray) and not dest.flags.writeable:
+                    dest = np.array(dest, copy=True)  # make a fresh, writable buffer
+                    self._adata_ref._X = dest


I would actually just let the error be thrown in this case. If something isn't writeable, I don't think that's our responsibility to handle

ilan-gold · 2025-08-19T15:35:13Z

src/anndata/_core/merge.py

+        hasattr(x, "dtype") and is_extension_array_dtype(x.dtype)
+    ):
+        return x
+    return np.asarray(x)


Ok nice this is the right direction no doubt! So what we want here probably is not to rely on asarray but dlpack to do the conversion. In short:

We should have a check in _apply_to_array to see if something is array-api compatible but not a numpy ndarray.

If this case is true, dlpack into numpy, recursively call _apply_to_array

Then use dlpack to take the output of the recursive call to the original type before we went to numpy.

Does that make sense?

I think this is a nice paradigm to follow for situations where we have an existing numpy or cupy implementation and it isn't clear how to use the array-api to achieve our aims. We should still try to use it as much as possible so that we can eventually remove numpy codepaths where possible, but this is a nice first step.

… with copying introduced as an extra precaution

ilan-gold · 2025-08-21T15:37:06Z

src/anndata/_core/merge.py

+def _dlpack_from_numpy(x_np, original_xp):
+    # cubed and other array later elif
+    if original_xp.__name__.startswith("jax"):
+        return jax.dlpack.from_dlpack(x_np)


https://data-apis.org/array-api/latest/API_specification/generated/array_api.from_dlpack.html#array_api.from_dlpack

ilan-gold · 2025-08-21T15:37:16Z

src/anndata/_core/merge.py


 T = TypeVar("T")

+with suppress(ImportError):


See https://data-apis.org/array-api/latest/API_specification/generated/array_api.from_dlpack.html#array_api.from_dlpack

… error

ilan-gold · 2025-09-12T13:42:04Z

src/anndata/_core/merge.py

+        # Use the backend of the first array as the reference
+        ref = arrays[0]
+        xp = get_namespace(ref)
+
+        # Convert all arrays to the same backend as `ref`
+        arrays = [ref] + [_same_backend(ref, x, copy=True)[1] for x in arrays[1:]]
+
+        # Concatenate with the backend’s API
+        value = xp.concatenate(


This condition was previously hit by the fact that none of the above checks involving any were True. Instead of changing this last default condition, I would create a new branch here specifically for the array-api, check that they all have the same backend, and the concatenate. If they don't have the same backed, you just proceed to the np condition (which will fail presumably). I wouldn't worry about mixing different backends, especially with the array-api for now. If we use cubed, dlpack won't work there anyway

ilan-gold · 2025-09-12T13:43:49Z

src/anndata/_core/merge.py

+
+    # fallback for known backends that put it elsewhere (JAX and later others)
    if original_xp.__name__.startswith("jax"):
+        import jax.dlpack


original_xp should have a from_dlpack method! Does my comment here 383c445#r2291442475 not apply?

ilan-gold · 2025-09-12T13:45:50Z

tests/test_readwrite.py



-def test_write_large_categorical(tmp_path, diskfmt):
+@pytest.mark.parametrize("xp", [np, jnp])  # xp = array namespace


I would revert this - it's just for generating categories which gets pushed into pandas. I don't think this triggers any internal array-api code

ilan-gold · 2025-09-16T09:57:33Z

src/anndata/_core/index.py

-        return indexer.data
+        return indexer.dat
+
+    elif has_xp(indexer):


This looks nearly identical to the numpy case. In a case like this, I think you were/would be right to just merge the two if possible. If it's not I would explain why.

ilan-gold · 2025-09-16T10:01:10Z

src/anndata/_core/merge.py

-        return pd.api.extensions.take(
-            el, indexer, axis=axis, allow_fill=True, fill_value=fill_value
-        )
+        if _is_pandas(el):


Suggested change

if _is_pandas(el):

if isinstance(el, np.ndarray):

I would have thought that el is a numpy array given that the old function name was _apply_to_array, no?

ilan-gold · 2025-09-16T10:04:28Z

src/anndata/_core/merge.py

+            # reverting back to numpy as it is hard to reindex on JAX and others
+            return _dlpack_from_numpy(out_np, xp)
+
+        # numpy case


I think the logic here is a little confused. This function used to be for numpy, but you added the _is_pandas check above, which I don't think applies here. But the logic you've written "# numpy case" and down works great for non-numpy array-api compatible arrays as well!

So I would leave the numpy case as before (i.e., remove _is_pandas and check isinstance(el, np.ndarray)), and then in the case it is not a numpy array, use this logic under "# numpy case"! You can then get rid of the if not isinstance(el, np.ndarray) and _is_array_api_compatible(el): branch. One reason I cautioned against falling back to numpy behavior is that some things like jax arrays that are API compatible might be on the GPU! You can't transfer a JAX array on the GPU to numpy :/

src/anndata/_core/views.py

src/anndata/_io/specs/registry.py

…ssed

ilan-gold · 2025-10-13T13:20:43Z

src/anndata/_core/views.py

+        xp = aac.array_namespace(old)
+        # skip early if numpy
+        if xp.__name__.startswith("numpy"):
+            return old[new]


Why special case this? If isinstance(old, numpy.ndarray) then singledispatch handles this for us. If something is resolved to use numpy by array_api_compat (I don't see why this would ever be the case here, and I don't see anything in the docs about falling back to numpy)

ilan-gold · 2025-10-13T13:21:43Z

src/anndata/_core/views.py

+            if where_fn is not None:
+                old = where_fn(old)[0]
+            else:
+                # if no where function is found, fallback to NumPy
+                old = np.where(np.asarray(old))[0]


How could where_fn ever be None? https://data-apis.org/array-api/latest/API_specification/generated/array_api.where.html#where is part of the array-api

ilan-gold · 2025-10-13T13:22:24Z

src/anndata/_core/views.py

+            return old[new]
+
+        # handle boolean mask; i.e. checking whether old is a boolean array
+        if hasattr(old, "dtype") and str(old.dtype) in ("bool", "bool_", "boolean"):


If old is not array-api compatible, shouldn't we error out? old.dtype would exist and https://data-apis.org/array-api/latest/API_specification/generated/array_api.isdtype.html#isdtype could handle checking for bool, no need for strings or anything

ilan-gold · 2025-10-13T13:23:00Z

src/anndata/_core/views.py

+        # if new is a slice object, converting it into a range of indices using arange
+        if isinstance(new, slice):
+            # trying to get arange from the backend
+            arange_fn = getattr(xp, "arange", None)


Why getattr (same with where)? Shouldn't xp.arange exist: https://data-apis.org/array-api/latest/API_specification/generated/array_api.arange.html#arange?

ilan-gold · 2025-10-13T13:23:49Z

src/anndata/_io/specs/registry.py

+        # Skip if it's already NumPy
        if xp.__name__.startswith("numpy"):
            return x


How could this condition be reached if we have the above isinstance(x, np.ndarray | ...) check?

ilan-gold · 2025-10-13T13:25:47Z

src/anndata/_io/specs/registry.py

+def to_numpy_if_array_api(x):
+    if isinstance(
+        x,
+        np.ndarray
+        | np.generic
+        | pd.DataFrame
+        | pd.Series
+        | pd.Index
+        | ExtensionArray
+        | DaskArray
+        | sp.spmatrix
+        | AnnData,
+    ):
+        return x


Given this check, I would think this function should be called to_writeable, no?

ilan-gold · 2025-10-13T13:27:49Z

src/anndata/_io/specs/registry.py

+        if not isinstance(elem, AnnData):
+            elem = normalize_nested(elem)


Wouldn't we want to also normalize AnnData objects so they're sub elements are also corrected?

ilan-gold · 2025-10-13T13:31:36Z

src/anndata/_core/merge.py

+        # use first as a reference to check if all of the arrays are the same type
+        xp = get_namespace(arrays[0])
+
+        if not all(get_namespace(a) is xp or a.shape == 0 for a in arrays):
+            msg = "Cannot concatenate array-api arrays from different backends."
+            raise ValueError(msg)


I think the check around making sure they have the same namespace is encapsulated in https://data-apis.org/array-api-compat/helper-functions.html#array_api_compat.array_namespace since it can take in an array of array objects - you could use that instead of just using the first one

ilan-gold · 2025-10-13T13:32:43Z

src/anndata/_core/merge.py

+        if _is_pandas(el) or isinstance(el, pd.DataFrame):
+            return self._apply_to_df_like(el, axis=axis, fill_value=fill_value)


Why did you add this?

ilan-gold · 2025-10-13T13:34:21Z

src/anndata/_core/merge.py

+        # Check: is array-api compatible, but not NumPy
+        if not isinstance(el, np.ndarray) and _is_array_api_compatible(el):
+            # Convert to NumPy via DLPack
+            el_np = _dlpack_to_numpy(el)
+            # Recursively call this same function
+            out_np = self._apply_to_array_api(el_np, axis=axis, fill_value=fill_value)
+            # reverting back to numpy as it is hard to reindex on JAX and others
+            return _dlpack_from_numpy(out_np, xp)


Why convert to numpy here? Just let the array pass through, that's the whole point of using the array api :) If a jax array is on the GPU you're going to bring it to the CPU here, but why?

Apologies if it wasn't clear in my previous comment - we should only go to numpy if the existing array is on the CPU and we couldn't come up with a generic way of doing this via the array-api. But you made a way of doing this via the array-api which is great, so no need to convert to numpy!

ilan-gold

Apologies if this is going in circles, but I'm kind of reviewing locally to changes, and losing the big picture sometimes!

ilan-gold · 2025-10-13T13:38:27Z

src/anndata/_core/merge.py

+        # Check: is array-api compatible, but not NumPy
+        if not isinstance(el, np.ndarray) and _is_array_api_compatible(el):
+            # Convert to NumPy via DLPack
+            el_np = _dlpack_to_numpy(el)
+            # Recursively call this same function
+            out_np = self._apply_to_array_api(el_np, axis=axis, fill_value=fill_value)
+            # reverting back to numpy as it is hard to reindex on JAX and others
+            return _dlpack_from_numpy(out_np, xp)


Apologies if it wasn't clear in my previous comment - we should only go to numpy if the existing array is on the CPU and we couldn't come up with a generic way of doing this via the array-api. But you made a way of doing this via the array-api which is great, so no need to convert to numpy!

ilan-gold and others added 18 commits July 29, 2025 14:05

chore: add tests

21d5882

fix: allow creating objects with array-api

7cd6d69

chore: add indexing test

eff9dde

fix: add xp pass-through

b230d11

chore: add more indexing methods

692a270

Merge branch 'main' into ig/array_api_starter

b52e5b7

initial backend step

806becf

concat lazy error fix

98d249a

comment for merge fix

cdc4fdd

dask array fix

030f985

fix test_concatenate_roundtrip[inner-np_array-pandas-concat-lazy-conc…

5aff4d6

…at_func0-False]

fixed all contact errors

ba74743

comments fix

a749637

quick fix but not ideal as it converts jax and other arrays to numpy

c716fb5

Merge branch 'main' into ig/array_api_continue

951c026

comment fix

d8adf27

Merge branch 'ig/array_api_continue' of github.com:amalia-k510/anndat…

0e410a4

…a into ig/array_api_continue import merge

comment fix

8ea34e8

flying-sheep reviewed Aug 11, 2025

View reviewed changes

extra tests and function changes

96992d5

ilan-gold reviewed Aug 19, 2025

View reviewed changes

amalia-k510 added 4 commits August 21, 2025 17:17

dlpack introduction and trying to make gen_adata fully backend native…

460428a

… with copying introduced as an extra precaution

removed unnecessary function

dd3b867

minor fixes

743ebb3

minor fix

5a6c825

ilan-gold reviewed Aug 21, 2025

View reviewed changes

amalia-k510 added 3 commits August 25, 2025 14:25

begin merge modification

90d9e6a

concat on jax arrays is introduced

787feb0

precommit fixes

dea107d

amalia-k510 added 5 commits August 26, 2025 13:46

merge quick fix

cdd3747

minor fixes

b04c8ef

writer and reindexer introduced + jax in the tests, still need to fix…

383c445

… error

just concat errors left

eef9015

test fixes for jax

2d2275b

ilan-gold reviewed Sep 12, 2025

View reviewed changes

amalia-k510 added 2 commits September 15, 2025 13:25

indexer implementation and comments addressed

a2a8606

test_double_index_jax fixed

dcbd235

ilan-gold reviewed Sep 16, 2025

View reviewed changes

register default case, merge issues, and gpu/cpu array transfer addre…

c227265

…ssed

amalia-k510 requested a review from ilan-gold October 13, 2025 12:01

ilan-gold requested changes Oct 13, 2025

View reviewed changes

ilan-gold reviewed Oct 13, 2025

View reviewed changes

		# Force to NumPy (materializes JAX/Cubed); fine for small tests,
		# but may be slow or fail on large/lazy arrays



		def test_write_large_categorical(tmp_path, diskfmt):
		@pytest.mark.parametrize("xp", [np, jnp]) # xp = array namespace

		if not isinstance(elem, AnnData):
		elem = normalize_nested(elem)

		if _is_pandas(el) or isinstance(el, pd.DataFrame):
		return self._apply_to_df_like(el, axis=axis, fill_value=fill_value)

Backend-native Implementation #2071

Are you sure you want to change the base?

Backend-native Implementation #2071

Uh oh!

Conversation

amalia-k510 commented Aug 11, 2025

Uh oh!

flying-sheep left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ilan-gold Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ilan-gold Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ilan-gold Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ilan-gold left a comment

Choose a reason for hiding this comment

Uh oh!

ilan-gold Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

ilan-gold Sep 12, 2025 •

edited

Loading

ilan-gold Sep 16, 2025 •

edited

Loading

ilan-gold Oct 13, 2025 •

edited

Loading

ilan-gold Oct 13, 2025 •

edited

Loading