Skip to content

GlobalLimitExec execution offset pagination query results in internal error #15665

Closed
@lalaorya

Description

@lalaorya

Describe the bug

When using the LIMIT clause, simple LIMIT N syntax (such as LIMIT 10) works normally, but when using the syntax with an offset (such as LIMIT 10,20), it fails and results in an internal error.

  1. Create a table tbl with data.(using a custom union provider that includes MemTable and ListingTable.)
  2. Execute the following query (works normally):
    SELECT * FROM tbl WHERE appId = 'xxx' LIMIT 0,10
  3. Execute the following query (fails):
    SELECT * FROM tbl WHERE appId = 'xxx' LIMIT 10,20

Error Information

Internal error: GlobalLimitExec requires a single input partition.
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker

Execution Plan Comparison

Successful Query Execution Plan

[grpc-flight] physical plan after generate
CoalesceBatchesExec: target_batch_size=8192, fetch=20
  FilterExec: appId@1 = 6eb17b3d80344184bb4c5592 AND _l2_timestamp@0 >= 1744199279329000 AND _l2_timestamp@0 < 1744199879329000
    UnionExec
      SortExec: expr=[_l2_timestamp@0 DESC NULLS LAST], preserve_partitioning=[false]
        MemoryExec: partitions=1, partition_sizes=[6]
      ParquetExec: file_groups={1 group: [[113bf01e-4ec6-42b8-9fbb-f8846bc51664/schema=schema_key/$$/dev/app/appId=6eb17b3d80344184bb4c5592/1744199703383977.1744199797977616.16.7315697162543697288.parquet, 113bf01e-4ec6-42b8-9fbb-f8846bc51664/schema=schema_key/$$/dev/app/appId=6eb17b3d80344184bb4c5592/1744065150679723.1744199685314003.205345.7315697162543697221.parquet]]}, projection=[_l2_timestamp, appId, message, namespace], output_ordering=[_l2_timestamp@0 DESC NULLS LAST], predicate=appId@1 = 6eb17b3d80344184bb4c5592 AND _l2_timestamp@0 >= 1744199279329000 AND _l2_timestamp@0 < 1744199879329000, pruning_predicate=appId_null_count@2 != appId_row_count@3 AND appId_min@0 <= 6eb17b3d80344184bb4c5592 AND 6eb17b3d80344184bb4c5592 <= appId_max@1 AND _l2_timestamp_null_count@5 != _l2_timestamp_row_count@6 AND _l2_timestamp_max@4 >= 1744199279329000 AND _l2_timestamp_null_count@5 != _l2_timestamp_row_count@6 AND _l2_timestamp_min@7 < 1744199879329000, required_guarantees=[appId in (6eb17b3d80344184bb4c5592)]

Failed Query Execution Plan

[grpc-flight] physical plan after generate
GlobalLimitExec: skip=10, fetch=20
  CoalesceBatchesExec: target_batch_size=8192, fetch=30
    FilterExec: appId@1 = 6eb17b3d80344184bb4c5592 AND _l2_timestamp@0 >= 1744199369830000 AND _l2_timestamp@0 < 1744199969830000
      UnionExec
        SortExec: expr=[_l2_timestamp@0 DESC NULLS LAST], preserve_partitioning=[false]
          MemoryExec: partitions=1, partition_sizes=[3]
        ParquetExec: file_groups={1 group: [[b5107ed9-933b-40e1-b87b-81d3da109d5f/schema=schema_key/$$/dev/app/appId=6eb17b3d80344184bb4c5592/1744199879085286.1744199936023233.10.7315697162543697422.parquet, b5107ed9-933b-40e1-b87b-81d3da109d5f/schema=schema_key/$$/dev/app/appId=6eb17b3d80344184bb4c5592/1744199818030386.1744199863928697.8.7315697162543697371.parquet, b5107ed9-933b-40e1-b87b-81d3da109d5f/schema=schema_key/$$/dev/app/appId=6eb17b3d80344184bb4c5592/1744065150679723.1744199797977616.205361.7315697162543697335.parquet]]}, projection=[_l2_timestamp, appId, message, namespace], output_ordering=[_l2_timestamp@0 DESC NULLS LAST], predicate=appId@1 = 6eb17b3d80344184bb4c5592 AND _l2_timestamp@0 >= 1744199369830000 AND _l2_timestamp@0 < 1744199969830000, pruning_predicate=appId_null_count@2 != appId_row_count@3 AND appId_min@0 <= 6eb17b3d80344184bb4c5592 AND 6eb17b3d80344184bb4c5592 <= appId_max@1 AND _l2_timestamp_null_count@5 != _l2_timestamp_row_count@6 AND _l2_timestamp_max@4 >= 1744199369830000 AND _l2_timestamp_null_count@5 != _l2_timestamp_row_count@6 AND _l2_timestamp_min@7 < 1744199969830000, required_guarantees=[appId in (6eb17b3d80344184bb4c5592)]

Questions and Clarifications

From the execution plan, I can see that the offset LIMIT query uses GlobalLimitExec: skip=10, fetch=20, but fails with an error saying "GlobalLimitExec requires a single input partition."

I'm not sure if this is a bug or a limitation in my usage pattern. Could someone explain:

  1. What "GlobalLimitExec requires a single input partition" means in this context?
  2. Is there a specific pattern I should follow when using LIMIT with an offset in DataFusion?
  3. Are there any configuration settings or query modifications that could help resolve this issue?

Any insights would be appreciated.

To Reproduce

No response

Expected behavior

No response

Additional context

  • DataFusion version: 45.0.0
  • Operating System: linux x86_64

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions