-
Notifications
You must be signed in to change notification settings - Fork 4k
GH-48241: [Python] Scalar inferencing doesn't infer UUID #48727
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Happy New Year! ❤️ I would suggest adding the documentation to the Extending PyArrow page under the Canonical extension types section as a separate subsection next to Fixed size tensor one. |
rok
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Two minor nits.
| void InitUuidStaticData() { | ||
| std::call_once(uuid_static_initialized, GetUuidStaticSymbols); | ||
| } | ||
| #else | ||
| void InitUuidStaticData() { | ||
| if (uuid_static_initialized) { | ||
| return; | ||
| } | ||
| GetUuidStaticSymbols(); | ||
| uuid_static_initialized = true; | ||
| } | ||
| #endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we could use std::call_once for both cases here?
| PyBytesView& view) { | ||
| ARROW_RETURN_NOT_OK(view.ParseString(obj)); | ||
| // Check if obj is a uuid.UUID instance | ||
| if (type->byte_width() == 16 && internal::IsPyUuid(obj)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A nit: it seems uuid.UUID can't have other bitwidths, so we don't really need the type->byte_width() == 16 check.
Rationale for this change
This is to cover reported issues #48241, #44224 and #43855.
Currently uuid.UUID objects are not inferred/converted automatically in PyArrow, requiring users to explicitly specify the type.
What changes are included in this PR?
Adding support for Python's uuid.UUID objects in PyArrow's type inference and conversion.
Are these changes tested?
Yes, added test_uuid_scalar_from_python() and test_uuid_array_from_python() in
test_extension.py.Are there any user-facing changes?
Users can now pass Python uuid.UUID objects directly to PyArrow functions like pa.scalar() and pa.array() without specifying the type;
<pyarrow.UuidScalar: UUID('958174b9-3a5c-4cdd-8fc5-d51a2fc55784')>
<pyarrow.lib.UuidArray object at 0x1217725f0>
[
73611FD81F764A209C8B9CDBADDA1F53
]