Skip to content

Support for Numpy 2 #409

Closed
Closed
@jakelishman

Description

@jakelishman

Hello!

With Numpy 2.0.0rc1 moving closer towards a release, has there been any thought about what will be needed to support the later version or if there's anything needed to make it safer to use in the presence of C API changes? I'm interested in helping with the transition to support both 1.x and 2.x, if useful.

For what it's worth, just naively running the test suite against a local build of Numpy 2 (as of numpy/numpy@182ee60) does pretty well, albeit using the deprecated numpy.core.multiarray path over the new numpy._core.multiarray. The failures I observed are:

  • --lib
    • dtype::tests::test_dtype_names (bool_ is now bool)
  • --test array
    • half_bf16_works: ml_dtypes fails to import because it's not ready for Numpy 2
    • copy_to_works: slot 82 PyArray_CopyInto from _ARRAY_API is now NULL in Numpy 2 so this reliably segfaults (though PyArray_CopyInto got moved into slot 50, so it's still there).

Everything else passed for me (macOS 14, Python 3.11, x86_64).

While copy_to_works is the only segfault I saw in the test suite, this is the current set of changes in the generated C API capsule between Numpy 1.26.4 and Numpy 2.0.0 (as of numpy/numpy@182ee60), where the number is the offset into the PyArray_API pointer array:

Removed in 'api_2.0.0.c':
     1: (void *) &PyBigArray_Type
     4: (void *) &PyArrayFlags_Type
    40: (void *) PyArray_SetNumericOps
    41: (void *) PyArray_GetNumericOps
    65: (void *) PyArray_ScalarFromObject
    66: (void *) PyArray_GetCastFunc
    67: (void *) PyArray_FromDims
    68: (void *) PyArray_FromDimsAndDataAndDescr
    81: (void *) PyArray_MoveInto
    82: (void *) PyArray_CopyInto
    83: (void *) PyArray_CopyAnyInto
   103: (void *) PyArray_FillObjectArray
   115: (void *) PyArray_NewFlagsObject
   117: (void *) PyArray_CompareUCS4
   122: (void *) PyArray_FieldNames
   163: (void *) PyArray_As1D
   164: (void *) PyArray_As2D
   171: (void *) PyArray_CopyAndTranspose
   173: (void *) PyArray_TypestrConvert
   197: (void *) PyArray_TypeNumFromName
   201: (void *) _PyArray_SigintHandler
   202: (void *) _PyArray_GetSigintBuf
   208: (void *) PyArray_CompareString
   219: (void *) PyArray_SetDatetimeParseFunction
   278: (void *) PyArray_GetArrayParamsFromObject
   291: (void *) PyDataMem_SetEventHook
   293: (void *) PyArray_MapIterSwapAxes
   294: (void *) PyArray_MapIterArray
   295: (void *) PyArray_MapIterNext
   301: (void *) PyArray_MapIterArrayCopyIfOverlap

Different between 'api_1.26.4.c' and 'api_2.0.0.c':
    50:
      (void *) PyArray_CastTo
      (void *) PyArray_CopyInto
    51:
      (void *) PyArray_CastAnyTo
      (void *) PyArray_CopyAnyInto

Added in 'api_2.0.0.c':
   307: (void *) NpyDatetime_ConvertDatetime64ToDatetimeStruct
   308: (void *) NpyDatetime_ConvertDatetimeStructToDatetime64
   309: (void *) NpyDatetime_ConvertPyDateTimeToDatetimeStruct
   310: (void *) NpyDatetime_GetDatetimeISO8601StrLen
   311: (void *) NpyDatetime_MakeISO8601Datetime
   312: (void *) NpyDatetime_ParseISO8601Datetime
   313: (void *) NpyString_load
   314: (void *) NpyString_pack
   315: (void *) NpyString_pack_null
   316: (void *) NpyString_acquire_allocator
   317: (void *) NpyString_acquire_allocators
   318: (void *) NpyString_release_allocator
   319: (void *) NpyString_release_allocators

All of the entries in the "removed in" are a place where the slot now has a null pointer, so would segfault if called. Slots 82 and 83 (PyArray_CopyInto and PyArray_CopyAnyInto) got moved into slots 50 and 51 respectively.

I've not been following if there were ABI incompatible differences between any functions themselves - I think there was a change from Py_ssize_t to intptr_t (or vice versa) in some, which can be ABI incompatible for some more esoteric platforms.


Would an appropriate way of handling most of the removals be to mark those in impl_array_api as fallible, and have an explicit non-null pointer check on each access so we can safely panic rather than segfaulting? For the high-level safe API, does something like checking the version on the GILOnceCell initialisation and saving the result so functions can check and Err on bad calls work?

Happy to help with any implementation, if it'd be useful.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions