Skip to content

Commit 9feb58c

Browse files
authored
build: change default Maven profile to Spark 4.1 / Scala 2.13 (#4140)
1 parent 5d137e2 commit 9feb58c

15 files changed

Lines changed: 71 additions & 61 deletions

.github/workflows/docker-publish.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,6 @@ jobs:
7474
with:
7575
platforms: linux/amd64,linux/arm64
7676
push: true
77-
tags: ghcr.io/apache/datafusion-comet:spark-3.5-scala-2.12-${{ env.COMET_VERSION }}
77+
tags: ghcr.io/apache/datafusion-comet:spark-4.1-scala-2.13-${{ env.COMET_VERSION }}
7878
file: kube/Dockerfile
7979
no-cache: true

.github/workflows/pr_build_linux.yml

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -451,6 +451,8 @@ jobs:
451451
runs-on: ${{ github.repository_owner == 'apache' && format('runs-on={0},family=m8a+m7a+c8a,cpu=16,image=ubuntu24-full-x64,extras=s3-cache,disk=large,tag=datafusion-comet', github.run_id) || 'ubuntu-latest' }}
452452
container:
453453
image: amd64/rust
454+
env:
455+
JAVA_TOOL_OPTIONS: --add-exports=java.base/sun.nio.ch=ALL-UNNAMED --add-exports=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/jdk.internal.ref=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED
454456
steps:
455457
- uses: runs-on/action@742bf56072eb4845a0f94b3394673e4903c90ff0 # v2.1.0
456458

@@ -460,7 +462,7 @@ jobs:
460462
uses: ./.github/actions/setup-builder
461463
with:
462464
rust-version: ${{ env.RUST_VERSION }}
463-
jdk-version: 11
465+
jdk-version: 17
464466

465467
- name: Download native library
466468
uses: actions/download-artifact@v8
@@ -505,6 +507,8 @@ jobs:
505507
runs-on: ${{ github.repository_owner == 'apache' && format('runs-on={0},family=m8a+m7a+c8a,cpu=16,image=ubuntu24-full-x64,extras=s3-cache,disk=large,tag=datafusion-comet', github.run_id) || 'ubuntu-latest' }}
506508
container:
507509
image: amd64/rust
510+
env:
511+
JAVA_TOOL_OPTIONS: --add-exports=java.base/sun.nio.ch=ALL-UNNAMED --add-exports=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/jdk.internal.ref=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED
508512
strategy:
509513
matrix:
510514
join: [sort_merge, broadcast, hash]
@@ -518,7 +522,7 @@ jobs:
518522
uses: ./.github/actions/setup-builder
519523
with:
520524
rust-version: ${{ env.RUST_VERSION }}
521-
jdk-version: 11
525+
jdk-version: 17
522526

523527
- name: Download native library
524528
uses: actions/download-artifact@v8

docs/source/contributor-guide/benchmarking_aws_ec2.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -104,7 +104,7 @@ make release
104104
Set `COMET_JAR` environment variable.
105105

106106
```shell
107-
export COMET_JAR=/home/ec2-user/datafusion-comet/spark/target/comet-spark-spark3.5_2.12-$COMET_VERSION.jar
107+
export COMET_JAR=/home/ec2-user/datafusion-comet/spark/target/comet-spark-spark4.1_2.13-$COMET_VERSION.jar
108108
```
109109

110110
## Run Benchmarks

docs/source/contributor-guide/benchmarking_macos.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -55,13 +55,13 @@ export DF_BENCH=`pwd`
5555

5656
## Install Spark
5757

58-
Install Apache Spark. This example refers to 3.5.4 version.
58+
Install Apache Spark. This example refers to 4.1.1 version.
5959

6060
```shell
61-
wget https://archive.apache.org/dist/spark/spark-3.5.4/spark-3.5.4-bin-hadoop3.tgz
62-
tar xzf spark-3.5.4-bin-hadoop3.tgz
63-
sudo mv spark-3.5.4-bin-hadoop3 /opt
64-
export SPARK_HOME=/opt/spark-3.5.4-bin-hadoop3/
61+
wget https://archive.apache.org/dist/spark/spark-4.1.1/spark-4.1.1-bin-hadoop3.tgz
62+
tar xzf spark-4.1.1-bin-hadoop3.tgz
63+
sudo mv spark-4.1.1-bin-hadoop3 /opt
64+
export SPARK_HOME=/opt/spark-4.1.1-bin-hadoop3/
6565
```
6666

6767
Start Spark in standalone mode:
@@ -129,7 +129,7 @@ make release COMET_FEATURES=mimalloc
129129
Set `COMET_JAR` to point to the location of the Comet jar file. Example for Comet 0.8
130130

131131
```shell
132-
export COMET_JAR=`pwd`/spark/target/comet-spark-spark3.5_2.12-0.8.0-SNAPSHOT.jar
132+
export COMET_JAR=`pwd`/spark/target/comet-spark-spark4.1_2.13-0.8.0-SNAPSHOT.jar
133133
```
134134

135135
Run the following command (the `--data` parameter will need to be updated to point to your S3 bucket):

docs/source/contributor-guide/benchmarking_spark_sql_perf.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -34,8 +34,8 @@ partitioning and writing to Parquet format automatically.
3434

3535
## Prerequisites
3636

37-
- Java 17 (for Spark 3.5+)
38-
- Apache Spark 3.5.x
37+
- Java 17
38+
- Apache Spark 4.1.x
3939
- SBT (Scala Build Tool)
4040
- C compiler toolchain (`gcc`, `make`, `flex`, `bison`, `byacc`)
4141

@@ -225,7 +225,7 @@ Build Comet from source and launch `spark-shell` with both the Comet and spark-s
225225

226226
```shell
227227
make release
228-
export COMET_JAR=$(pwd)/spark/target/comet-spark-spark3.5_2.12-*.jar
228+
export COMET_JAR=$(pwd)/spark/target/comet-spark-spark4.1_2.13-*.jar
229229

230230
$SPARK_HOME/bin/spark-shell \
231231
--master $SPARK_MASTER \

docs/source/contributor-guide/debugging.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -136,7 +136,7 @@ make release COMET_FEATURES=backtrace
136136
Set `RUST_BACKTRACE=1` for the Spark worker/executor process, or for `spark-submit` if running in local mode.
137137

138138
```console
139-
RUST_BACKTRACE=1 $SPARK_HOME/spark-shell --jars spark/target/comet-spark-spark3.5_2.12-$COMET_VERSION.jar --conf spark.plugins=org.apache.spark.CometPlugin --conf spark.comet.enabled=true --conf spark.comet.exec.enabled=true
139+
RUST_BACKTRACE=1 $SPARK_HOME/spark-shell --jars spark/target/comet-spark-spark4.1_2.13-$COMET_VERSION.jar --conf spark.plugins=org.apache.spark.CometPlugin --conf spark.comet.enabled=true --conf spark.comet.exec.enabled=true
140140
```
141141

142142
Get the expanded exception details

docs/source/contributor-guide/iceberg-spark-tests.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ Here is an overview of the changes that the diffs make to Iceberg:
4040
Run `make release` in Comet to install the Comet JAR into the local Maven repository, specifying the Spark version.
4141

4242
```shell
43-
PROFILES="-Pspark-3.5" make release
43+
PROFILES="-Pspark-4.1" make release
4444
```
4545

4646
## 2. Clone Iceberg and Apply Diff

docs/source/user-guide/latest/datasources.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -69,12 +69,12 @@ Unlike to native Comet reader the Datafusion reader fully supports nested types
6969
To build Comet with native DataFusion reader and remote HDFS support it is required to have a JDK installed
7070

7171
Example:
72-
Build a Comet for `spark-3.5` provide a JDK path in `JAVA_HOME`
72+
Build a Comet for `spark-4.1` provide a JDK path in `JAVA_HOME`
7373
Provide the JRE linker path in `RUSTFLAGS`, the path can vary depending on the system. Typically JRE linker is a part of installed JDK
7474

7575
```shell
76-
export JAVA_HOME="/opt/homebrew/opt/openjdk@11"
77-
make release PROFILES="-Pspark-3.5" COMET_FEATURES=hdfs RUSTFLAGS="-L $JAVA_HOME/libexec/openjdk.jdk/Contents/Home/lib/server"
76+
export JAVA_HOME="/opt/homebrew/opt/openjdk@17"
77+
make release PROFILES="-Pspark-4.1" COMET_FEATURES=hdfs RUSTFLAGS="-L $JAVA_HOME/libexec/openjdk.jdk/Contents/Home/lib/server"
7878
```
7979

8080
Start Comet with experimental reader and HDFS support as [described](installation.md/#run-spark-shell-with-comet-enabled)
@@ -149,7 +149,7 @@ docker compose -f kube/local/hdfs-docker-compose.yml up
149149
- Build a project with HDFS support
150150

151151
```shell
152-
JAVA_HOME="/opt/homebrew/opt/openjdk@11" make release PROFILES="-Pspark-3.5" COMET_FEATURES=hdfs RUSTFLAGS="-L /opt/homebrew/opt/openjdk@11/libexec/openjdk.jdk/Contents/Home/lib/server"
152+
JAVA_HOME="/opt/homebrew/opt/openjdk@17" make release PROFILES="-Pspark-4.1" COMET_FEATURES=hdfs RUSTFLAGS="-L /opt/homebrew/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home/lib/server"
153153
```
154154

155155
- Run local test

docs/source/user-guide/latest/iceberg.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ reader is enabled by default. To disable it, set `spark.comet.scan.icebergNative
3131

3232
```shell
3333
$SPARK_HOME/bin/spark-shell \
34-
--packages org.apache.datafusion:comet-spark-spark3.5_2.12:0.14.0,org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.8.1,org.apache.iceberg:iceberg-core:1.8.1 \
34+
--packages org.apache.datafusion:comet-spark-spark4.1_2.13:0.14.0,org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.8.1,org.apache.iceberg:iceberg-core:1.8.1 \
3535
--repositories https://repo1.maven.org/maven2/ \
3636
--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
3737
--conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkCatalog \
@@ -106,7 +106,7 @@ configure Spark to use a REST catalog with Comet's native Iceberg scan:
106106

107107
```shell
108108
$SPARK_HOME/bin/spark-shell \
109-
--packages org.apache.datafusion:comet-spark-spark3.5_2.12:0.14.0,org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.8.1,org.apache.iceberg:iceberg-core:1.8.1 \
109+
--packages org.apache.datafusion:comet-spark-spark4.1_2.13:0.14.0,org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.8.1,org.apache.iceberg:iceberg-core:1.8.1 \
110110
--repositories https://repo1.maven.org/maven2/ \
111111
--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
112112
--conf spark.sql.catalog.rest_cat=org.apache.iceberg.spark.SparkCatalog \

docs/source/user-guide/latest/installation.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -85,7 +85,7 @@ Here are the direct links for downloading the Comet $COMET_VERSION jar file.
8585
- [Comet plugin for Spark 3.5 / Scala 2.12](https://repo1.maven.org/maven2/org/apache/datafusion/comet-spark-spark3.5_2.12/$COMET_VERSION/comet-spark-spark3.5_2.12-$COMET_VERSION.jar)
8686
- [Comet plugin for Spark 3.5 / Scala 2.13](https://repo1.maven.org/maven2/org/apache/datafusion/comet-spark-spark3.5_2.13/$COMET_VERSION/comet-spark-spark3.5_2.13-$COMET_VERSION.jar)
8787
- [Comet plugin for Spark 4.0 / Scala 2.13](https://repo1.maven.org/maven2/org/apache/datafusion/comet-spark-spark4.0_2.13/$COMET_VERSION/comet-spark-spark4.0_2.13-$COMET_VERSION.jar)
88-
- [Comet plugin for Spark 4.1 / Scala 2.13 (Experimental)](https://repo1.maven.org/maven2/org/apache/datafusion/comet-spark-spark4.1_2.13/$COMET_VERSION/comet-spark-spark4.1_2.13-$COMET_VERSION.jar)
88+
- [Comet plugin for Spark 4.1 / Scala 2.13](https://repo1.maven.org/maven2/org/apache/datafusion/comet-spark-spark4.1_2.13/$COMET_VERSION/comet-spark-spark4.1_2.13-$COMET_VERSION.jar)
8989
- [Comet plugin for Spark 4.2 / Scala 2.13 (Experimental)](https://repo1.maven.org/maven2/org/apache/datafusion/comet-spark-spark4.2_2.13/$COMET_VERSION/comet-spark-spark4.2_2.13-$COMET_VERSION.jar)
9090
<!-- ENDIF -->
9191

@@ -105,7 +105,7 @@ See the [Comet Kubernetes Guide](kubernetes.md) guide.
105105
Make sure `SPARK_HOME` points to the same Spark version as Comet was built for.
106106

107107
```shell
108-
export COMET_JAR=spark/target/comet-spark-spark3.5_2.12-$COMET_VERSION.jar
108+
export COMET_JAR=spark/target/comet-spark-spark4.1_2.13-$COMET_VERSION.jar
109109

110110
$SPARK_HOME/bin/spark-shell \
111111
--jars $COMET_JAR \
@@ -161,7 +161,7 @@ explicitly contain Comet otherwise Spark may use a different class-loader for th
161161
components which will then fail at runtime. For example:
162162

163163
```
164-
--driver-class-path spark/target/comet-spark-spark3.5_2.12-$COMET_VERSION.jar
164+
--driver-class-path spark/target/comet-spark-spark4.1_2.13-$COMET_VERSION.jar
165165
```
166166

167167
Some cluster managers may require additional configuration, see <https://spark.apache.org/docs/latest/cluster-overview.html>

0 commit comments

Comments
 (0)