Skip to content

[Python] If numpy is available use it for normalizing slice #46771

Open
@raulcd

Description

@raulcd

Describe the enhancement requested

There were some concerns after the following PR was merged on the performance about it:

I've tested with both previous case and new case with the following results:
Previously to the PR using numpy:

In [2]: a = pa.array(np.arange(0, 2_000_000))

In [3]: %timeit a[slice(1, 1_000_000, 2)]
763 μs ± 4.28 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

After the PR using Python object:

In [2]: a = pa.array(np.arange(0, 2_000_000))

In [3]: %timeit a[slice(1, 1_000_000, 2)]
25.7 ms ± 703 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)

We should probably use np.arange if numpy is available otherwise revert to using a Python object.

We could also potentially add the possibility for Array::Slice to have a step attribute being able to remove some of the custom code and just use step internally.

Component(s)

Python

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions