Closed
Description
Describe the bug
When using the LIMIT clause, simple LIMIT N
syntax (such as LIMIT 10
) works normally, but when using the syntax with an offset (such as LIMIT 10,20
), it fails and results in an internal error.
- Create a table
tbl
with data.(using a custom union provider that includes MemTable and ListingTable.) - Execute the following query (works normally):
SELECT * FROM tbl WHERE appId = 'xxx' LIMIT 0,10
- Execute the following query (fails):
SELECT * FROM tbl WHERE appId = 'xxx' LIMIT 10,20
Error Information
Internal error: GlobalLimitExec requires a single input partition.
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker
Execution Plan Comparison
Successful Query Execution Plan
[grpc-flight] physical plan after generate
CoalesceBatchesExec: target_batch_size=8192, fetch=20
FilterExec: appId@1 = 6eb17b3d80344184bb4c5592 AND _l2_timestamp@0 >= 1744199279329000 AND _l2_timestamp@0 < 1744199879329000
UnionExec
SortExec: expr=[_l2_timestamp@0 DESC NULLS LAST], preserve_partitioning=[false]
MemoryExec: partitions=1, partition_sizes=[6]
ParquetExec: file_groups={1 group: [[113bf01e-4ec6-42b8-9fbb-f8846bc51664/schema=schema_key/$$/dev/app/appId=6eb17b3d80344184bb4c5592/1744199703383977.1744199797977616.16.7315697162543697288.parquet, 113bf01e-4ec6-42b8-9fbb-f8846bc51664/schema=schema_key/$$/dev/app/appId=6eb17b3d80344184bb4c5592/1744065150679723.1744199685314003.205345.7315697162543697221.parquet]]}, projection=[_l2_timestamp, appId, message, namespace], output_ordering=[_l2_timestamp@0 DESC NULLS LAST], predicate=appId@1 = 6eb17b3d80344184bb4c5592 AND _l2_timestamp@0 >= 1744199279329000 AND _l2_timestamp@0 < 1744199879329000, pruning_predicate=appId_null_count@2 != appId_row_count@3 AND appId_min@0 <= 6eb17b3d80344184bb4c5592 AND 6eb17b3d80344184bb4c5592 <= appId_max@1 AND _l2_timestamp_null_count@5 != _l2_timestamp_row_count@6 AND _l2_timestamp_max@4 >= 1744199279329000 AND _l2_timestamp_null_count@5 != _l2_timestamp_row_count@6 AND _l2_timestamp_min@7 < 1744199879329000, required_guarantees=[appId in (6eb17b3d80344184bb4c5592)]
Failed Query Execution Plan
[grpc-flight] physical plan after generate
GlobalLimitExec: skip=10, fetch=20
CoalesceBatchesExec: target_batch_size=8192, fetch=30
FilterExec: appId@1 = 6eb17b3d80344184bb4c5592 AND _l2_timestamp@0 >= 1744199369830000 AND _l2_timestamp@0 < 1744199969830000
UnionExec
SortExec: expr=[_l2_timestamp@0 DESC NULLS LAST], preserve_partitioning=[false]
MemoryExec: partitions=1, partition_sizes=[3]
ParquetExec: file_groups={1 group: [[b5107ed9-933b-40e1-b87b-81d3da109d5f/schema=schema_key/$$/dev/app/appId=6eb17b3d80344184bb4c5592/1744199879085286.1744199936023233.10.7315697162543697422.parquet, b5107ed9-933b-40e1-b87b-81d3da109d5f/schema=schema_key/$$/dev/app/appId=6eb17b3d80344184bb4c5592/1744199818030386.1744199863928697.8.7315697162543697371.parquet, b5107ed9-933b-40e1-b87b-81d3da109d5f/schema=schema_key/$$/dev/app/appId=6eb17b3d80344184bb4c5592/1744065150679723.1744199797977616.205361.7315697162543697335.parquet]]}, projection=[_l2_timestamp, appId, message, namespace], output_ordering=[_l2_timestamp@0 DESC NULLS LAST], predicate=appId@1 = 6eb17b3d80344184bb4c5592 AND _l2_timestamp@0 >= 1744199369830000 AND _l2_timestamp@0 < 1744199969830000, pruning_predicate=appId_null_count@2 != appId_row_count@3 AND appId_min@0 <= 6eb17b3d80344184bb4c5592 AND 6eb17b3d80344184bb4c5592 <= appId_max@1 AND _l2_timestamp_null_count@5 != _l2_timestamp_row_count@6 AND _l2_timestamp_max@4 >= 1744199369830000 AND _l2_timestamp_null_count@5 != _l2_timestamp_row_count@6 AND _l2_timestamp_min@7 < 1744199969830000, required_guarantees=[appId in (6eb17b3d80344184bb4c5592)]
Questions and Clarifications
From the execution plan, I can see that the offset LIMIT query uses GlobalLimitExec: skip=10, fetch=20
, but fails with an error saying "GlobalLimitExec requires a single input partition."
I'm not sure if this is a bug or a limitation in my usage pattern. Could someone explain:
- What "GlobalLimitExec requires a single input partition" means in this context?
- Is there a specific pattern I should follow when using LIMIT with an offset in DataFusion?
- Are there any configuration settings or query modifications that could help resolve this issue?
Any insights would be appreciated.
To Reproduce
No response
Expected behavior
No response
Additional context
- DataFusion version: 45.0.0
- Operating System: linux x86_64