[SPARK-51586][SS] initialize input partitions independent of columnar support in continuous mode #50348

vrozov · 2025-03-21T17:51:59Z

What changes were proposed in this pull request?

Initialize input partitions independent of columnar support in ContinuousScanExec

Why are the changes needed?

After SPARK-45080 and PR 42823 Kafka continuous stream may go into infinite loop of reconfiguring. As KafkaScan columnar support mode is hardcoded to UNSUPPORTED ContinuousScanExec does not initialize inputPartitions in supportsColumnar as it did prior to the above PR. So, there is no call to KafkaContinuousStream.planInputPartitions during query planning. It leaves KafkaContinuousStream.knownPartitions uninitialized, so call to needsReconfiguration returns true. Now epochUpdateThread in ContinuousExecution requests interruption of queryExecutionThread, so that thread also can't initialize knownPartitions as it checks for interrupts in KafkaOffsetReaderAdmin.withRetries and exits before knownPartitions is assigned.

Does this PR introduce any user-facing change?

Yes, it fixes bug that may prevent kafka continuous stream from working properly

How was this patch tested?

Using existing Kafka tests. Those tests should complete faster.

Was this patch authored or co-authored using generative AI tooling?

No

vrozov · 2025-03-21T17:52:57Z

@HeartSaVioR please review

HeartSaVioR

Only a nit. Thanks for figuring this out!

HeartSaVioR · 2025-03-24T01:10:05Z

...tor/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaOffsetReaderAdmin.scala

@@ -535,7 +535,7 @@ private[kafka010] class KafkaOffsetReaderAdmin(
      var attempt = 1
      var lastException: Throwable = null
      while (result.isEmpty && attempt <= maxOffsetFetchAttempts
-        && !Thread.currentThread().isInterrupted) {
+        && !Thread.currentThread.isInterrupted) {


nit: shall we not change the code which is not relevant? You can have a different PR to do that, but please find more cases and change altogether.

IMO, it is too small to have a different PR and it is somewhat related to the Kafka issue. Let me know if you strongly prefer to revert this change.

I prefer to revert the change. I don't want to mess up the commit log. What you change is completely just a style issue and I don't want to give this line to be shown as "fixing bug".

OK, reverted. IMO, it will be good to fix the style anyway and enforce it.

I think this is arguably not super clear style rule. Only "pure" parameterless method can remove the paren from the best practice of Scala (it's not that you can remove the paren as long as the method does not take any param), but there are arguments of what is "pure" method.

Here is the post which starts with very simple question got distracted with what is pure method.
https://users.scala-lang.org/t/paramterless-functions-with-and-without-parentheses-different/9939/3

In addition, Scala 3 will enforce the defined method with paren to "always" call with paren, regardless of parameterless, "pure" method, etcetc.

So I don't think this is an obvious style fix - someone could argue that we are going to break the style.

https://docs.scala-lang.org/scala3/reference/dropped-features/auto-apply.html: Thread.currentThread still will be the recommended way for Scala 3 as well.

The latter is idiomatic Scala because it conforms to the uniform access principle. This principle states that one should be able to change an object member from a field to a non-side-effecting method and back without affecting clients that access the member. Consequently, Scala encourages to define such "property" methods without a () parameter list whereas side-effecting methods should be defined with it. Methods defined in Java cannot make this distinction; for them a () is always mandatory. So Scala fixes the problem on the client side, by allowing the parameterless references. But where Scala allows that freedom for all method references, Scala 3 restricts it to references of external methods that are not defined themselves in Scala 3.

If you see the origin thread, the definition of "pure" method is not super obvious; it's not something compiler would be able to catch and it's mostly on human's justification. This is just a homework Scala left to us and I don't really like the fact Scala makes such an ambiguity.

Continuing from there, I'd rather interpret this as Java is just excluded from the enforcement. It is still arguable what is "pure" method and everyone has every different opinions. Thread.currentThread is a native method, and it's not super obvious that there is no side-effect.

… support

HeartSaVioR

+1

HeartSaVioR · 2025-03-25T01:34:24Z

Thanks! Merging to master/4.0 (I guess we'd like to not hold the bug to another 6 months.)

… support in continuous mode ### What changes were proposed in this pull request? Initialize input partitions independent of columnar support in `ContinuousScanExec` ### Why are the changes needed? After [SPARK-45080](https://issues.apache.org/jira/browse/SPARK-45080) and [PR 42823](#42823) Kafka continuous stream may go into infinite loop of reconfiguring. As `KafkaScan` columnar support mode is hardcoded to `UNSUPPORTED` `ContinuousScanExec` does not initialize `inputPartitions` in `supportsColumnar` as it did prior to the above PR. So, there is no call to `KafkaContinuousStream.planInputPartitions` during query planning. It leaves `KafkaContinuousStream.knownPartitions` uninitialized, so call to `needsReconfiguration` returns `true`. Now `epochUpdateThread` in `ContinuousExecution` requests interruption of `queryExecutionThread`, so that thread also can't initialize `knownPartitions` as it checks for interrupts in `KafkaOffsetReaderAdmin.withRetries` and exits before `knownPartitions` is assigned. ### Does this PR introduce _any_ user-facing change? Yes, it fixes bug that may prevent kafka continuous stream from working properly ### How was this patch tested? Using existing Kafka tests. Those tests should complete faster. ### Was this patch authored or co-authored using generative AI tooling? No Closes #50348 from vrozov/SPARK-51586. Authored-by: Vlad Rozov <[email protected]> Signed-off-by: Jungtaek Lim <[email protected]> (cherry picked from commit 11d1409) Signed-off-by: Jungtaek Lim <[email protected]>

… support in continuous mode ### What changes were proposed in this pull request? Initialize input partitions independent of columnar support in `ContinuousScanExec` ### Why are the changes needed? After [SPARK-45080](https://issues.apache.org/jira/browse/SPARK-45080) and [PR 42823](apache#42823) Kafka continuous stream may go into infinite loop of reconfiguring. As `KafkaScan` columnar support mode is hardcoded to `UNSUPPORTED` `ContinuousScanExec` does not initialize `inputPartitions` in `supportsColumnar` as it did prior to the above PR. So, there is no call to `KafkaContinuousStream.planInputPartitions` during query planning. It leaves `KafkaContinuousStream.knownPartitions` uninitialized, so call to `needsReconfiguration` returns `true`. Now `epochUpdateThread` in `ContinuousExecution` requests interruption of `queryExecutionThread`, so that thread also can't initialize `knownPartitions` as it checks for interrupts in `KafkaOffsetReaderAdmin.withRetries` and exits before `knownPartitions` is assigned. ### Does this PR introduce _any_ user-facing change? Yes, it fixes bug that may prevent kafka continuous stream from working properly ### How was this patch tested? Using existing Kafka tests. Those tests should complete faster. ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#50348 from vrozov/SPARK-51586. Authored-by: Vlad Rozov <[email protected]> Signed-off-by: Jungtaek Lim <[email protected]>

github-actions bot added SQL STRUCTURED STREAMING labels Mar 21, 2025

HeartSaVioR reviewed Mar 24, 2025

View reviewed changes

[SPARK-51586][SS] initialize input partitions independent of columnar…

1cbe808

… support

vrozov force-pushed the SPARK-51586 branch from d9d9862 to 1cbe808 Compare March 24, 2025 22:55

github-actions bot removed the STRUCTURED STREAMING label Mar 24, 2025

HeartSaVioR approved these changes Mar 25, 2025

View reviewed changes

HeartSaVioR changed the title ~~[SPARK-51586][SS] initialize input partitions independent of columnar support~~ [SPARK-51586][SS] initialize input partitions independent of columnar support in continuous mode Mar 25, 2025

HeartSaVioR closed this in 11d1409 Mar 25, 2025

vrozov deleted the SPARK-51586 branch March 25, 2025 15:26

vrozov mentioned this pull request Mar 25, 2025

[SPARK-51318][BUILD] Remove test jars in source releases #50378

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-51586][SS] initialize input partitions independent of columnar support in continuous mode #50348

[SPARK-51586][SS] initialize input partitions independent of columnar support in continuous mode #50348

Uh oh!

vrozov commented Mar 21, 2025

Uh oh!

vrozov commented Mar 21, 2025

Uh oh!

HeartSaVioR left a comment

Uh oh!

HeartSaVioR Mar 24, 2025

Uh oh!

vrozov Mar 24, 2025

Uh oh!

HeartSaVioR Mar 24, 2025 •

edited

Loading

Uh oh!

vrozov Mar 24, 2025

Uh oh!

HeartSaVioR Mar 25, 2025 •

edited

Loading

Uh oh!

vrozov Mar 25, 2025

Uh oh!

HeartSaVioR Mar 25, 2025

Uh oh!

HeartSaVioR left a comment

Uh oh!

HeartSaVioR commented Mar 25, 2025 •

edited

Loading

Uh oh!

Uh oh!

[SPARK-51586][SS] initialize input partitions independent of columnar support in continuous mode #50348

[SPARK-51586][SS] initialize input partitions independent of columnar support in continuous mode #50348

Uh oh!

Conversation

vrozov commented Mar 21, 2025

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

vrozov commented Mar 21, 2025

Uh oh!

HeartSaVioR left a comment

Choose a reason for hiding this comment

Uh oh!

HeartSaVioR Mar 24, 2025

Choose a reason for hiding this comment

Uh oh!

vrozov Mar 24, 2025

Choose a reason for hiding this comment

Uh oh!

HeartSaVioR Mar 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vrozov Mar 24, 2025

Choose a reason for hiding this comment

Uh oh!

HeartSaVioR Mar 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vrozov Mar 25, 2025

Choose a reason for hiding this comment

Uh oh!

HeartSaVioR Mar 25, 2025

Choose a reason for hiding this comment

Uh oh!

HeartSaVioR left a comment

Choose a reason for hiding this comment

Uh oh!

HeartSaVioR commented Mar 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

HeartSaVioR Mar 24, 2025 •

edited

Loading

HeartSaVioR Mar 25, 2025 •

edited

Loading

HeartSaVioR commented Mar 25, 2025 •

edited

Loading