Skip to content

fix(spark) CTAS with UNION in Spark 4.x native writer#3524

Open
Shekharrajak wants to merge 5 commits into
apache:mainfrom
Shekharrajak:fix/3429-spark4-ctas-union-native-writer
Open

fix(spark) CTAS with UNION in Spark 4.x native writer#3524
Shekharrajak wants to merge 5 commits into
apache:mainfrom
Shekharrajak:fix/3429-spark4-ctas-union-native-writer

Conversation

@Shekharrajak
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Closes #3429

Rationale for this change

CTAS with UNION fails in Spark 4.x when the Comet native writer is enabled. The requiresNativeChildren check in CometExecRule was too strict - it required children to be CometNativeExec, but sink operators like CometUnionExec, CometCoalesceExec, CometCollectLimitExec extend CometExec directly while still producing Arrow-formatted data.

What changes are included in this PR?

  • Changed the type check from CometNativeExec to CometExec in CometExecRule.convertToComet() to allow all Comet operators that produce Arrow-formatted data as input to the native writer.

How are these changes tested?

Added 6 new test cases in CometParquetWriterSuite:

  • Basic UNION write (CTAS style)
  • Multiple (3+) DataFrames UNION

@Shekharrajak Shekharrajak changed the title Fix CTAS with UNION in Spark 4.x native writer (#3429) Fix CTAS with UNION in Spark 4.x native writer Feb 16, 2026
@Shekharrajak Shekharrajak changed the title Fix CTAS with UNION in Spark 4.x native writer fix(spark) CTAS with UNION in Spark 4.x native writer Feb 16, 2026
Copy link
Copy Markdown
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I confirmed that some of the tests fail without the fix. Thanks @Shekharrajak

@andygrove
Copy link
Copy Markdown
Member

Test failure:

- parquet write with union and coalesce *** FAILED *** (557 milliseconds)
  99 did not equal 98 Expected 98 rows (49 + 49) (CometParquetWriterSuite.scala:214)

@andygrove
Copy link
Copy Markdown
Member

test failure:

- parquet write with union of structs *** FAILED *** (169 milliseconds)
  0 did not equal 1 Expected exactly one CometNativeWriteExec in the plan, but found 0:
  Execute InsertIntoHadoopFsRelationCommand file:/__w/datafusion-comet/datafusion-comet/spark/target/tmp/spark-2f9604b3-87ca-41c1-90d8-1cbed0a42386/output.parquet, false, Parquet, [path=/__w/datafusion-comet/datafusion-comet/spark/target/tmp/spark-2f9604b3-87ca-41c1-90d8-1cbed0a42386/output.parquet], ErrorIfExists, [id, person]
  +- WriteFiles
     +- Union
        :- *(1) Project [1 AS id#1514, [Alice,30] AS person#1515]
        :  +- *(1) Scan OneRowRelation[]
        +- *(2) Project [2 AS id#1516, [Bob,25] AS person#1517]
           +- *(2) Scan OneRowRelation[] (CometParquetWriterSuite.scala:

@github-actions
Copy link
Copy Markdown

Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days.

@github-actions github-actions Bot added the Stale label May 21, 2026
@Shekharrajak
Copy link
Copy Markdown
Contributor Author

Please trigger the CI workflow .

@Shekharrajak Shekharrajak force-pushed the fix/3429-spark4-ctas-union-native-writer branch from 70c8daf to e2ebc26 Compare May 21, 2026 15:42
@andygrove andygrove removed the Stale label May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[COMET NATIVE WRITER] Fix test spark 4x : ctas with union

2 participants