Skip to content

feat(data-types): Support PyArrow ExtensionTypes in pyarrow2athena#3351

Merged
kukushking merged 3 commits into
aws:mainfrom
richacode007-byte:feat/extension-type-athena-support
Jun 9, 2026
Merged

feat(data-types): Support PyArrow ExtensionTypes in pyarrow2athena#3351
kukushking merged 3 commits into
aws:mainfrom
richacode007-byte:feat/extension-type-athena-support

Conversation

@richacode007-byte

Copy link
Copy Markdown
Contributor

Feature or Bugfix

Detail

  • pyarrow2athena() in awswrangler/_data_types.py raised UnsupportedType: Unsupported Pyarrow type: extension<...> whenever it encountered any PyArrow ExtensionType. This made wr.s3.read_parquet_metadata() fail on parquet files containing canonical extension types such as pa.uuid() (and any user-defined extension type).
  • Added a recursive branch in pyarrow2athena() that unwraps any PyArrow extension type to its underlying storage type, mirroring the existing pattern used for dictionary types. pa.uuid() now maps to binary (via its fixed_size_binary(16) storage), and user-defined extensions resolve to the Athena type of their storage.
  • The check uses isinstance(dtype, getattr(pa, "BaseExtensionType", pa.ExtensionType)) to catch both canonical extension types (which subclass pa.BaseExtensionType in pyarrow 12+) and Python-defined extensions (subclasses of pa.ExtensionType), with a safe fallback on older pyarrow versions.
  • Added three unit tests in tests/unit/test_s3_parquet.py covering: the UUID canonical extension, a custom user-defined extension, and athena_types_from_pyarrow_schema (the wrapper used by read_parquet_metadata).

Relates

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@kukushking kukushking merged commit 0a46ee7 into aws:main Jun 9, 2026
28 checks passed
@kukushking

Copy link
Copy Markdown
Collaborator

Thanks @richacode007-byte !

@linda-ting

Copy link
Copy Markdown

thank you! :~)

@richacode007-byte

Copy link
Copy Markdown
Contributor Author

@kukushking @linda-ting Thanks for review and merging the code :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support conversion from Pyarrow ExtensionTypes to Athena types

3 participants