I noticed the following line:
s = np.sum(arr, axis=0)
If the code is performance-sensitive and does not rely on extra parameters like dtype, keepdims, or initial, you might consider replacing it with:
s = np.add.reduce(arr, axis=0)
np.sum is a high-level convenience function that internally calls np.add.reduce, but includes additional checks and wrappers. When you're performing simple summation along an axis, using np.add.reduce directly eliminates that overhead and can offer a slight speedup—especially in tight loops or large-scale computations.