Skip to content

Support for grouping in UUID columns #46468

Open
@Fokko

Description

@Fokko

Describe the enhancement requested

Python 3.10.16 (main, Dec  3 2024, 17:27:57) [Clang 16.0.0 (clang-1600.0.26.4)]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.31.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import pyarrow as pa
   ...: import uuid
   ...: 
   ...: arr_table = pa.Table.from_pydict(
   ...:     {
   ...:         "uuid": [
   ...:             uuid.UUID("00000000-0000-0000-0000-000000000000").bytes,
   ...:             uuid.UUID("11111111-1111-1111-1111-111111111111").bytes,
   ...:         ],
   ...:     },
   ...:     schema=pa.schema(
   ...:         [
   ...:             pa.field("uuid", pa.uuid(), nullable=False),
   ...:         ]
   ...:     ),
   ...: )
   ...: 
   ...: arr_table.group_by('uuid').aggregate([])
---------------------------------------------------------------------------
ArrowNotImplementedError                  Traceback (most recent call last)
Cell In[1], line 18
      2 import uuid
      4 arr_table = pa.Table.from_pydict(
      5     {
      6         "uuid": [
   (...)
     15     ),
     16 )
---> 18 arr_table.group_by('uuid').aggregate([])

File /opt/homebrew/lib/python3.10/site-packages/pyarrow/table.pxi:6560, in pyarrow.lib.TableGroupBy.aggregate()

File /opt/homebrew/lib/python3.10/site-packages/pyarrow/acero.py:410, in _group_by(table, aggregates, keys, use_threads)
    404 def _group_by(table, aggregates, keys, use_threads=True):
    406     decl = Declaration.from_sequence([
    407         Declaration("table_source", TableSourceNodeOptions(table)),
    408         Declaration("aggregate", AggregateNodeOptions(aggregates, keys=keys))
    409     ])
--> 410     return decl.to_table(use_threads=use_threads)

File /opt/homebrew/lib/python3.10/site-packages/pyarrow/_acero.pyx:590, in pyarrow._acero.Declaration.to_table()

File /opt/homebrew/lib/python3.10/site-packages/pyarrow/error.pxi:155, in pyarrow.lib.pyarrow_internal_check_status()

File /opt/homebrew/lib/python3.10/site-packages/pyarrow/error.pxi:92, in pyarrow.lib.check_status()

ArrowNotImplementedError: Keys of type extension<arrow.uuid>

Looking at the stacktrace, I think we've need to change something here. The UUID is just a fixed with column under the hood, so I think we can re-use that logic.

Thoughts from the Arrow maintainers?

Component(s)

Python

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions