-
Notifications
You must be signed in to change notification settings - Fork 3.7k
GH-46606: [Python] Do not require numpy when normalizing slice #46732
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for the contribution @shu-kitamura ! |
@AlenkaF I added the test I ran the test in an environment without numpy and confirmed that it passes.
|
Thanks! Looking at It is failing now for a different reason (see AppVeyor error) and the failure is connected. The issue is when the indices that feed into |
Thanks! I moved test The failure of the test When using |
I added handling for the case where |
Thanks for the updates!
What I meant earlier is that the test case using With the change introduced in this PR, |
Thank you for reviewing it so many times.
Sorry too. I misunderstood.
I added the comment
I removed |
Thanks! I have run the full CI, let's see how it goes =) |
Three CIs have failed.😭 I think the following failure is caused by using I don't know about the other two yet, I'll look at the logs. |
All good, that is why they are set - to make sure we do not miss anything (or as little as possible 😉 )
Correct. Similar to what you have done in this PR, the test data needs to be updated to use
Other two are not connected. |
I fixed to not use
I'm sorry, but I don't understand what it means to "not connected." |
No problem. One other CI build that is failing has a known issue (#46516) and so is not connected to the changes in this PR and we can ignore it. Similar for the lint one, I can't find an open issue for it though. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again for the contribution @shu-kitamura !
@raulcd mind giving a sanity check before I merge?
@github-actions crossbow submit -g python |
Revision: 9b3cb60 Submitted crossbow builds: ursacomputing/crossbow @ actions-f955378e43 |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I am running extended CI to double check and have updated the title of the PR to describe what we are doing. Will merge once CI run finishes if successful.
Thanks @AlenkaF for the reviews and @shu-kitamura for the PR!
The failing builds do not look related, though |
Yes, CI failures are unrelated, the And the |
Were any benchmarks run on this change? Calling |
Thanks @pitrou for taking a look, I should have pinged you on this one before merging.
I haven't run benchmarks, maybe we should validate the performance changes and if significant use numpy if available otherwise use the new code path?
when you say converting it to a Arrow array afterwards you mean on the case of no indices being returned?
or when using the |
We are not really converting numpy array or list to PyArrow array. We are only using a different path to construct indices to pass to |
Yes, this one.
Extremely slower as you have to convert generic Python objects to a contiguous native array. >>> start, stop, step = 1, 1_000_000, 2
>>> %timeit np.arange(start, stop, step)
115 μs ± 741 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
>>> %timeit pa.array(np.arange(start, stop, step))
120 μs ± 479 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
>>> %timeit list(range(start, stop, step))
13.1 ms ± 84.3 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>>> %timeit pa.array(list(range(start, stop, step)))
32.9 ms ± 56.9 μs per loop (mean ± std. dev. of 7 runs, 10 loops each) And then: >>> a = pa.array(np.arange(0, 2_000_000))
>>> %timeit a.take(np.arange(start, stop, step))
818 μs ± 1.86 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
>>> %timeit a.take(list(range(start, stop, step)))
33 ms ± 101 μs per loop (mean ± std. dev. of 7 runs, 10 loops each) |
After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 494d0e3. There were 69 benchmark results with an error:
There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 5 possible false positives for unstable benchmarks that are known to sometimes produce them. |
Rationale for this change
Slicing an array in non-trivial steps raises an exception when Numpy is not installed.
#46606
What changes are included in this PR?
I changed
np.arange(...)
tolist(range(...))
Inpython/pyarrow/array.pxi
Are these changes tested?
Yes
Are there any user-facing changes?
No