Skip to content

Conversation

@Indhumathi27
Copy link
Contributor

@Indhumathi27 Indhumathi27 commented Nov 18, 2025

What changes were proposed in this pull request?

This PR updates TopNKeyProcessor to skip creating TopNKeyOperator when ReduceSinkDesc.topN is already set by LIMIT pushdown for ORDER BY LIMIT case. This prevents TopNKey from overriding pushdown and ensures the Reduce sink TopNkey filtering is used.

Why are the changes needed?

Currently, when a query includes ORDER BY + LIMIT, LIMIT pushdown is generated during planning but is effectively overridden by the subsequent TopNKey rewrite. As a result, TopNKey operator receives full input rather than a reduced data set, leading to worse performance (e.g., 16M rows forwarded to reducer instead of a few). In cases where global ordering uses a single reducer, LIMIT pushdown is sufficient and far more efficient. This fix prevents unnecessary TopNKey creation so that pushdown can reduce shuffle and significantly improve execution time.

Test reports:

For query: select * from table order by h limit 100;
Total num of rows: 67764224

with fix:
Screenshot 2025-12-01 at 11 45 37 PM

without fix:
Screenshot 2025-12-01 at 11 49 39 PM

Does this PR introduce any user-facing change?

No

How was this patch tested?

Manual testing + existing testcases

@Indhumathi27 Indhumathi27 changed the title Draft: HIVE-29322: Avoid TopNKeyOperator When Map-Side LIMIT Pushdown Provides Better Pruning HIVE-29322: Avoid TopNKeyOperator When Map-Side LIMIT Pushdown Provides Better Pruning Nov 27, 2025
@Indhumathi27
Copy link
Contributor Author

@kasakrisz @deniskuzZ Can you help to review this PR. Thanks

@Indhumathi27 Indhumathi27 force-pushed the disable_topn branch 4 times, most recently from db493f2 to a003d03 Compare December 1, 2025 18:13
@Indhumathi27 Indhumathi27 changed the title HIVE-29322: Avoid TopNKeyOperator When Map-Side LIMIT Pushdown Provides Better Pruning HIVE-29322: Avoid TopNKeyOperator When Map-Side LIMIT Pushdown Provides Better Pruning for ORDER BY LIMIT Queries Dec 1, 2025
@Indhumathi27
Copy link
Contributor Author

@ayushtkn @zabetak @kasakrisz @deniskuzZ Can you help to review the PR. thanks

Comment on lines +1333 to +1370
private static boolean hasPTFReduceSink(OptimizeTezProcContext ctx) {
for (ReduceSinkOperator rs : ctx.visitedReduceSinks) {
if (rs.getConf().isPTFReduceSink()) {
return true;
}
}
return false;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this achieves the desired outcome. Basically, if there is a PTF RS anywhere in the plan we will apply the rule on every RS (no matter if it is PTF or not).

Moreover, by relying on ctx.visitedReduceSinks we make the TopNKeyOptimization highly dependent on stats dependent optimization which is not great.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For windowing queries, since there is not much performance issues with TopNKey enabled, currently making the queries to use TopNkey Path. But to match the plan, there is no sequence of PTF%RS% patterns for some queries. only RS% will work for this case.
I chosed this approach, to avoid traversing the tree to check query has PTF operator.
can you suggest a solution for the windowing queries

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it a better solution to traverse the tree to find PTF ?

// Restrict TopNKey matching to GROUP BY / JOIN ReduceSinks, except for PTF queries.
String reduceSinkOp = ReduceSinkOperator.getOperatorName() + "%";
String topNKeyRegexPattern = hasPTFReduceSink(procCtx) ? reduceSinkOp :
".*(" + GroupByOperator.getOperatorName() + "|" +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to have the match any prefix .* at the beginning of the pattern?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. Some queries with Group by not matching current Regex pattern without the prefix

Comment on lines +1344 to +1345
boolean hasOrderOrLimit =
procCtx.parseContext.getQueryProperties().hasLimit() ||
procCtx.parseContext.getQueryProperties().hasOrderBy();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this? It is usually better if we can keep the optimization/transformation rules independent of the SQL syntax.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is added for windowing queries to use topNkey Path - without group by / join in the query.
example: windowing_streaming.q
select * from ( select p_mfgr, rank() over(partition by p_mfgr order by p_name) r from part) a where r < 4

@sonarqubecloud
Copy link

sonarqubecloud bot commented Jan 7, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants