Skip to content

[DO_NOT_MERGE] Test Spark 4.0 RC5 #4533

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 25 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
e3f3e16
Initial cherry-pick of changes
allisonport-db Apr 28, 2025
642bd89
Update spark_master_python_test.yaml
allisonport-db Apr 28, 2025
bdddbe1
upgrade pipenv + pip version set
allisonport-db Apr 28, 2025
3a1cf71
Fix pyspark version
allisonport-db Apr 28, 2025
af67186
Disable other connector tests for now
allisonport-db Apr 28, 2025
0dd0b7b
Fix mypy error
allisonport-db Apr 28, 2025
907b3a8
Move ignore
allisonport-db Apr 28, 2025
148957f
Fix pyspark version in setup.py too
allisonport-db Apr 28, 2025
344ca12
Try to fix some things
allisonport-db Apr 30, 2025
5dad417
fix spark examples
allisonport-db May 1, 2025
73058e9
Add back test config for structured logging
allisonport-db May 1, 2025
4d50cbf
Try to separate out the sharing tests
allisonport-db May 1, 2025
6d825c7
Try overriding the default method
allisonport-db May 2, 2025
88e9ffb
Update DeltaThrowable.scala
allisonport-db May 2, 2025
67b395c
Try a more narrow fix for now
allisonport-db May 5, 2025
d7aa584
Merge remote-tracking branch 'delta-io/master' into spark-rc4-with-de…
allisonport-db May 5, 2025
f2867e7
Ignore call overload
allisonport-db May 5, 2025
847b016
We also have issues with DeltaOperationException
allisonport-db May 5, 2025
8213e3f
Fix the integration tests for now
allisonport-db May 5, 2025
970a486
Upgrade delta client to 1.3.0
allisonport-db May 9, 2025
29c971a
Use RC5
allisonport-db May 12, 2025
26ff4e2
Merge remote-tracking branch 'delta-io/master' into spark-rc5
allisonport-db May 13, 2025
f66dedf
resolve merge conflicts
allisonport-db May 13, 2025
ef960d9
Merge remote-tracking branch 'delta-io/master' into spark-rc5
allisonport-db May 13, 2025
df3dc56
Remove fix no longer needed
allisonport-db May 13, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 1 addition & 13 deletions .github/workflows/connectors_test.yaml
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
# TODO - do we need to update this?
name: "Delta Connectors"
on: [push, pull_request]
jobs:
Expand Down Expand Up @@ -27,19 +28,6 @@ jobs:
- name: Run Scala Style tests on test sources (Scala 2.12 only)
run: build/sbt "++ ${{ matrix.scala }}" testScalastyle
if: startsWith(matrix.scala, '2.12.')
- name: Run sqlDeltaImport tests (Scala 2.12 and 2.13 only)
run: build/sbt "++ ${{ matrix.scala }}" sqlDeltaImport/test
if: ${{ !startsWith(matrix.scala, '2.11.') }}
# These tests are not working yet
# - name: Run Delta Standalone Compatibility tests (Scala 2.12 only)
# run: build/sbt "++ ${{ matrix.scala }}" compatibility/test
# if: startsWith(matrix.scala, '2.12.')
- name: Run Delta Standalone tests
run: build/sbt "++ ${{ matrix.scala }}" standalone/test testStandaloneCosmetic/test standaloneParquet/test testParquetUtilsWithStandaloneCosmetic/test
- name: Run Hive 3 tests
run: build/sbt "++ ${{ matrix.scala }}" hiveMR/test hiveTez/test
- name: Run Hive 2 tests
run: build/sbt "++ ${{ matrix.scala }}" hive2MR/test hive2Tez/test
- name: Run Flink tests (Scala 2.12 only)
run: build/sbt -mem 3000 "++ ${{ matrix.scala }}" flink/test
if: ${{ startsWith(matrix.scala, '2.12.') }}
24 changes: 19 additions & 5 deletions .github/workflows/kernel_test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,17 +5,31 @@ jobs:
name: "DK"
runs-on: ubuntu-24.04
env:
SCALA_VERSION: 2.12.18
SCALA_VERSION: 2.13.13
steps:
- uses: actions/checkout@v3

# Install JDK 8
- name: install java
uses: actions/setup-java@v3
with:
distribution: "zulu"
java-version: "8"
- name: Run tests
run: |
python run-tests.py --group kernel --coverage
- name: Run integration tests

# Run integration tests with JDK 8, as they have no Spark dependency
- name: Run integration tests (JDK 8)
run: |
cd kernel/examples && python run-kernel-examples.py --use-local

# Install JDK 17
- name: install java
uses: actions/setup-java@v3
with:
distribution: "zulu"
java-version: "17"

# Run unit tests with JDK 17. These unit tests depend on Spark, and Spark 4.0+ is JDK 17.
- name: Run unit tests (JDK 17)
# Disable coverage for now because it compiles all projects & causes a flink failure
run: |
python run-tests.py --group kernel
11 changes: 4 additions & 7 deletions .github/workflows/spark_examples_test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ jobs:
strategy:
matrix:
# These Scala versions must match those in the build.sbt
scala: [2.12.18, 2.13.13]
scala: [2.13.13]
env:
SCALA_VERSION: ${{ matrix.scala }}
steps:
Expand All @@ -24,7 +24,7 @@ jobs:
uses: actions/setup-java@v3
with:
distribution: "zulu"
java-version: "8"
java-version: "17"
- name: Cache Scala, SBT
uses: actions/cache@v3
with:
Expand All @@ -44,13 +44,10 @@ jobs:
sudo apt install libedit-dev
if: steps.git-diff.outputs.diff
- name: Run Delta Spark Local Publishing and Examples Compilation
# examples/scala/build.sbt will compile against the local Delta relase version (e.g. 3.2.0-SNAPSHOT).
# examples/scala/build.sbt will compile against the local Delta release version (e.g. 3.2.0-SNAPSHOT).
# Thus, we need to publishM2 first so those jars are locally accessible.
# We publish storage explicitly so that it is available for the Scala 2.13 build. As a java project
# it is typically only released when publishing for Scala 2.12.
run: |
build/sbt clean
build/sbt storage/publishM2
build/sbt "++ $SCALA_VERSION publishM2"
build/sbt sparkGroup/publishM2
cd examples/scala && build/sbt "++ $SCALA_VERSION compile"
if: steps.git-diff.outputs.diff
7 changes: 4 additions & 3 deletions .github/workflows/spark_master_python_test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,9 @@ jobs:
pipenv run pip install pip==24.0 setuptools==69.5.1 wheel==0.43.0
pipenv run pip install flake8==3.9.0
pipenv run pip install black==23.12.1
pipenv run pip install importlib_metadata==3.10.0
# The mypy versions 0.982 and 1.8.0 have conflicting rules (cannot get style checks to
# pass for both versions on the same file) so we upgrade this to match Spark 4.0
pipenv run pip install mypy==1.8.0
pipenv run pip install mypy-protobuf==3.3.0
pipenv run pip install cryptography==37.0.4
Expand All @@ -75,12 +78,10 @@ jobs:
pipenv run pip install pandas==2.2.0
pipenv run pip install pyarrow==11.0.0
pipenv run pip install numpy==1.21
pipenv run pip install https://dist.apache.org/repos/dist/dev/spark/v4.0.0-rc4-bin/pyspark-4.0.0.tar.gz
pipenv run pip install https://dist.apache.org/repos/dist/dev/spark/v4.0.0-rc5-bin/pyspark-4.0.0.tar.gz
if: steps.git-diff.outputs.diff
- name: Run Python tests
# when changing TEST_PARALLELISM_COUNT make sure to also change it in spark_master_test.yaml
run: |
# We use the SBT version to choose our dependencies in our python packaging in setup.py
echo 'ThisBuild / version := "4.0.0-SNAPSHOT"' > version.sbt
TEST_PARALLELISM_COUNT=4 USE_SPARK_MASTER=true pipenv run python run-tests.py --group spark-python
if: steps.git-diff.outputs.diff
Original file line number Diff line number Diff line change
@@ -1,18 +1,17 @@
name: "Delta Iceberg Latest"
name: "Delta Sharing Spark Master"
on: [push, pull_request]
jobs:
test:
name: "DIL: Scala ${{ matrix.scala }}"
name: "Delta Sharing Spark Master"
runs-on: ubuntu-24.04
strategy:
matrix:
# These Scala versions must match those in the build.sbt
scala: [2.12.18, 2.13.13]
scala: [2.13.13]
env:
SCALA_VERSION: ${{ matrix.scala }}
steps:
- uses: actions/checkout@v3
# TODO we can make this more selective
- uses: technote-space/get-diff-action@v4
id: git-diff
with:
Expand All @@ -25,40 +24,29 @@ jobs:
uses: actions/setup-java@v3
with:
distribution: "zulu"
java-version: "8"
java-version: "17"
- name: Cache Scala, SBT
uses: actions/cache@v3
with:
path: |
~/.sbt
~/.ivy2
~/.cache/coursier
!~/.cache/coursier/v1/https/repository.apache.org/content/groups/snapshots
# Change the key if dependencies are changed. For each key, GitHub Actions will cache the
# the above directories when we use the key for the first time. After that, each run will
# just use the cache. The cache is immutable so we need to use a new key when trying to
# cache new stuff.
key: delta-sbt-cache-spark3.2-scala${{ matrix.scala }}
key: delta-sbt-cache-spark-master-scala${{ matrix.scala }}
- name: Install Job dependencies
# TODO: update pyspark installation once Spark preview is formally released
run: |
sudo apt-get update
sudo apt-get install -y make build-essential libssl-dev zlib1g-dev libbz2-dev libreadline-dev libsqlite3-dev wget curl llvm libncurses5-dev libncursesw5-dev xz-utils tk-dev libffi-dev liblzma-dev python3-openssl git
sudo apt install libedit-dev
curl -LO https://github.com/bufbuild/buf/releases/download/v1.28.1/buf-Linux-x86_64.tar.gz
mkdir -p ~/buf
tar -xvzf buf-Linux-x86_64.tar.gz -C ~/buf --strip-components 1
rm buf-Linux-x86_64.tar.gz
sudo apt install python3-pip --fix-missing
sudo pip3 install pipenv==2024.4.1
curl https://pyenv.run | bash
export PATH="~/.pyenv/bin:$PATH"
eval "$(pyenv init -)"
eval "$(pyenv virtualenv-init -)"
pyenv install 3.8.18
pyenv global system 3.8.18
pipenv --python 3.8.18 install
if: steps.git-diff.outputs.diff
- name: Run Scala/Java and Python tests
# when changing TEST_PARALLELISM_COUNT make sure to also change it in spark_master_test.yaml
- name: Run Delta Sharing tests
# NOTE: in this branch, the default sparkVersion is the SPARK_MASTER_VERSION
run: |
TEST_PARALLELISM_COUNT=4 pipenv run python run-tests.py --group iceberg
TEST_PARALLELISM_COUNT=4 build/sbt "++ ${{ matrix.scala }}" clean sharing/test
if: steps.git-diff.outputs.diff
47 changes: 42 additions & 5 deletions .github/workflows/spark_master_test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,11 @@ jobs:
# These Scala versions must match those in the build.sbt
scala: [2.13.13]
# Important: This list of shards must be [0..NUM_SHARDS - 1]
shard: [0, 1, 2]
shard: [0, 1, 2, 3]
env:
SCALA_VERSION: ${{ matrix.scala }}
# Important: This must be the same as the length of shards in matrix
NUM_SHARDS: 3
NUM_SHARDS: 4
steps:
- uses: actions/checkout@v3
- uses: technote-space/get-diff-action@v4
Expand Down Expand Up @@ -43,15 +43,52 @@ jobs:
# cache new stuff.
key: delta-sbt-cache-spark-master-scala${{ matrix.scala }}
- name: Install Job dependencies
# TODO: update pyspark installation once Spark preview is formally released
run: |
sudo apt-get update
sudo apt-get install -y make build-essential libssl-dev zlib1g-dev libbz2-dev libreadline-dev libsqlite3-dev wget curl llvm libncurses5-dev libncursesw5-dev xz-utils tk-dev libffi-dev liblzma-dev python3-openssl git
sudo apt install libedit-dev
curl -LO https://github.com/bufbuild/buf/releases/download/v1.28.1/buf-Linux-x86_64.tar.gz
mkdir -p ~/buf
tar -xvzf buf-Linux-x86_64.tar.gz -C ~/buf --strip-components 1
rm buf-Linux-x86_64.tar.gz
sudo apt install python3-pip --fix-missing
sudo pip3 install pipenv==2024.4.1
curl https://pyenv.run | bash
export PATH="~/.pyenv/bin:$PATH"
eval "$(pyenv init -)"
eval "$(pyenv virtualenv-init -)"
pyenv install 3.9
pyenv global system 3.9
pipenv --python 3.9 install
# Update the pip version to 24.0. By default `pyenv.run` installs the latest pip version
# available. From version 24.1, `pip` doesn't allow installing python packages
# with version string containing `-`. In Delta-Spark case, the pypi package generated has
# `-SNAPSHOT` in version (e.g. `3.3.0-SNAPSHOT`) as the version is picked up from
# the`version.sbt` file.
pipenv run pip install pip==24.0 setuptools==69.5.1 wheel==0.43.0
pipenv run pip install flake8==3.9.0
pipenv run pip install black==23.9.1
pipenv run pip install mypy==1.8.0
pipenv run pip install mypy-protobuf==3.3.0
pipenv run pip install cryptography==37.0.4
pipenv run pip install twine==4.0.1
pipenv run pip install wheel==0.33.4
pipenv run pip install setuptools==41.1.0
pipenv run pip install pydocstyle==3.0.0
pipenv run pip install pandas==1.4.4
pipenv run pip install pyarrow==8.0.0
pipenv run pip install numpy==1.21
pipenv run pip install https://dist.apache.org/repos/dist/dev/spark/v4.0.0-rc5-bin/pyspark-4.0.0.tar.gz
if: steps.git-diff.outputs.diff
- name: Run Spark Master tests
# when changing TEST_PARALLELISM_COUNT make sure to also change it in spark_test.yaml
- name: Run Delta Connect tests
run: |
TEST_PARALLELISM_COUNT=4 SHARD_ID=${{matrix.shard}} build/sbt -DsparkVersion=master "++ ${{ matrix.scala }}" clean spark/test
TEST_PARALLELISM_COUNT=4 build/sbt -DsparkVersion=master "++ ${{ matrix.scala }}" clean connectServer/test
TEST_PARALLELISM_COUNT=4 build/sbt -DsparkVersion=master "++ ${{ matrix.scala }}" clean connectServer/assembly connectClient/test
if: steps.git-diff.outputs.diff
- name: Run Delta Spark master tests
# when changing TEST_PARALLELISM_COUNT make sure to also change it in spark_test.yaml
# NOTE: in this branch, the default sparkVersion is the SPARK_MASTER_VERSION
run: |
TEST_PARALLELISM_COUNT=4 pipenv run python run-tests.py --group spark --shard ${{ matrix.shard }}
if: steps.git-diff.outputs.diff
85 changes: 0 additions & 85 deletions .github/workflows/spark_python_test.yaml

This file was deleted.

Loading
Loading