Skip to content

Conversation

@tadeja
Copy link
Contributor

@tadeja tadeja commented Jan 5, 2026

Rationale for this change

This is to cover reported issues #48241, #44224 and #43855.
Currently uuid.UUID objects are not inferred/converted automatically in PyArrow, requiring users to explicitly specify the type.

What changes are included in this PR?

Adding support for Python's uuid.UUID objects in PyArrow's type inference and conversion.

Are these changes tested?

Yes, added test_uuid_scalar_from_python() and test_uuid_array_from_python() in test_extension.py.

Are there any user-facing changes?

Users can now pass Python uuid.UUID objects directly to PyArrow functions like pa.scalar() and pa.array() without specifying the type;

import uuid
import pyarrow as pa

pa.scalar(uuid.uuid4())

<pyarrow.UuidScalar: UUID('958174b9-3a5c-4cdd-8fc5-d51a2fc55784')>

pa.array([uuid.uuid4()])

<pyarrow.lib.UuidArray object at 0x1217725f0>
[
73611FD81F764A209C8B9CDBADDA1F53
]

@tadeja
Copy link
Contributor Author

tadeja commented Jan 5, 2026

@AlenkaF Would you recommend any good place to document this UUID change? - I see @amoeba indicated the need for documentation in his draft pull request #44242.
ӇƛƤƤƳ_ƝЄƜ_ƳЄƛƦ:)

@AlenkaF
Copy link
Member

AlenkaF commented Jan 7, 2026

Happy New Year! ❤️

I would suggest adding the documentation to the Extending PyArrow page under the Canonical extension types section as a separate subsection next to Fixed size tensor one.

@tadeja
Copy link
Contributor Author

tadeja commented Jan 16, 2026

@AlenkaF, @rok do you have the chance to review this one - should enable multiple UUID use-cases.

Copy link
Member

@rok rok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Two minor nits.

Comment on lines +332 to +343
void InitUuidStaticData() {
std::call_once(uuid_static_initialized, GetUuidStaticSymbols);
}
#else
void InitUuidStaticData() {
if (uuid_static_initialized) {
return;
}
GetUuidStaticSymbols();
uuid_static_initialized = true;
}
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we could use std::call_once for both cases here?

PyBytesView& view) {
ARROW_RETURN_NOT_OK(view.ParseString(obj));
// Check if obj is a uuid.UUID instance
if (type->byte_width() == 16 && internal::IsPyUuid(obj)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A nit: it seems uuid.UUID can't have other bitwidths, so we don't really need the type->byte_width() == 16 check.

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting review Awaiting review labels Jan 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants