-
Notifications
You must be signed in to change notification settings - Fork 710
[GH-1918] Spark 4 support #1919
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
@@ -44,7 +44,7 @@ jobs: | |||
- name: Compile JavaDoc | |||
run: mvn -q clean install -DskipTests && mkdir -p docs/api/javadoc/spark && cp -r spark/common/target/apidocs/* docs/api/javadoc/spark/ | |||
- name: Compile ScalaDoc | |||
run: mvn scala:doc && mkdir -p docs/api/scaladoc/spark && cp -r spark/common/target/site/scaladocs/* docs/api/scaladoc/spark | |||
run: mvn generate-sources scala:doc && mkdir -p docs/api/scaladoc/spark && cp -r spark/common/target/site/scaladocs/* docs/api/scaladoc/spark |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was the only way I could figure out to get the scala docs to be aware of the additional source directory
/** | ||
* A physical plan that evaluates a [[PythonUDF]]. | ||
*/ | ||
case class SedonaArrowEvalPythonExec( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This arrow eval is the only thing I had to update from the spark-3.5
module to the spark-4.0
module due to some API changes. It looks like starting in 4.1 they added support for UDTs in arrow UDFs
<!-- We need to shade jiffle and it's antlr dependency because Spark 4 uses an | ||
incompatible version of antlr at runtime. --> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we shade it in geotools-wrapper so that no dependency reduced pom will be generated when building sedona-common? @jiayuasu
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It definitely needs to be shaded locally for the tests to work. I'm not 100% sure if the release could just be shaded into geotools-wrapper or not. My concern was if you somehow have jiffle as a separate dependency, those classes would be used with the provided antlr and not the relocated antlr dependency
Did you read the Contributor Guide?
Is this PR related to a ticket?
[GH-XXX] my subject
.What changes were proposed in this PR?
Add support for Spark 4.
This required several updates:
spark/common
module has source directories for Spark 3 and 4 respectively. I had to do it in the common module because things in the common module depend on the version specific shims. The main breaking changes that required this are:Column
objects are no longer wrappers aroundExpression
objects, but a newColumnNode
construct for Spark Connect support. Supporting the expression wrapping requires a different setup. Initially I started working on this through reflection, but this got pretty messy and this will require different artifacts anyway, so I added the conditional source directories.NullIntolerant
trait no longer exists, instead it's a an overridable function on an expressionjt-jiffle-language
and it'santlr
dependency have to be shaded into thecommon
module for Spark 4 to work. This is because in antlr 4.10 there was some internal version bump such that dependencies compiled with antlr < 4.10 can't run at runtime with >= 4.10. I thinkjt-jiffle-language
has an Apache license so I think this is ok? Currently it's a provided dependency that comes with the external geotools-wrapper. But need some verification here or thoughts on any alternative approach.spark-3.5
module as is tospark-4.0
. The only changes I had to make were to the new Arrow UDF stuff that was added recently. Could these also just be moved as conditional source directories inspark/common
?How was this patch tested?
Existing UTs.
Did this PR include necessary documentation updates?
Maybe supported versions need to change? Haven't looked at the docs yet.