Skip to content

[DO_NOT_MERGE] Test Spark 4.0 RC4 build #4473

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 20 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 1 addition & 13 deletions .github/workflows/connectors_test.yaml
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
# TODO - do we need to update this?
name: "Delta Connectors"
on: [push, pull_request]
jobs:
Expand Down Expand Up @@ -27,19 +28,6 @@ jobs:
- name: Run Scala Style tests on test sources (Scala 2.12 only)
run: build/sbt "++ ${{ matrix.scala }}" testScalastyle
if: startsWith(matrix.scala, '2.12.')
- name: Run sqlDeltaImport tests (Scala 2.12 and 2.13 only)
run: build/sbt "++ ${{ matrix.scala }}" sqlDeltaImport/test
if: ${{ !startsWith(matrix.scala, '2.11.') }}
# These tests are not working yet
# - name: Run Delta Standalone Compatibility tests (Scala 2.12 only)
# run: build/sbt "++ ${{ matrix.scala }}" compatibility/test
# if: startsWith(matrix.scala, '2.12.')
- name: Run Delta Standalone tests
run: build/sbt "++ ${{ matrix.scala }}" standalone/test testStandaloneCosmetic/test standaloneParquet/test testParquetUtilsWithStandaloneCosmetic/test
- name: Run Hive 3 tests
run: build/sbt "++ ${{ matrix.scala }}" hiveMR/test hiveTez/test
- name: Run Hive 2 tests
run: build/sbt "++ ${{ matrix.scala }}" hive2MR/test hive2Tez/test
- name: Run Flink tests (Scala 2.12 only)
run: build/sbt -mem 3000 "++ ${{ matrix.scala }}" flink/test
if: ${{ startsWith(matrix.scala, '2.12.') }}
24 changes: 19 additions & 5 deletions .github/workflows/kernel_test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,17 +5,31 @@ jobs:
name: "DK"
runs-on: ubuntu-24.04
env:
SCALA_VERSION: 2.12.18
SCALA_VERSION: 2.13.13
steps:
- uses: actions/checkout@v3

# Install JDK 8
- name: install java
uses: actions/setup-java@v3
with:
distribution: "zulu"
java-version: "8"
- name: Run tests
run: |
python run-tests.py --group kernel --coverage
- name: Run integration tests

# Run integration tests with JDK 8, as they have no Spark dependency
- name: Run integration tests (JDK 8)
run: |
cd kernel/examples && python run-kernel-examples.py --use-local

# Install JDK 17
- name: install java
uses: actions/setup-java@v3
with:
distribution: "zulu"
java-version: "17"

# Run unit tests with JDK 17. These unit tests depend on Spark, and Spark 4.0+ is JDK 17.
- name: Run unit tests (JDK 17)
# Disable coverage for now because it compiles all projects & causes a flink failure
run: |
python run-tests.py --group kernel
11 changes: 4 additions & 7 deletions .github/workflows/spark_examples_test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ jobs:
strategy:
matrix:
# These Scala versions must match those in the build.sbt
scala: [2.12.18, 2.13.13]
scala: [2.13.13]
env:
SCALA_VERSION: ${{ matrix.scala }}
steps:
Expand All @@ -24,7 +24,7 @@ jobs:
uses: actions/setup-java@v3
with:
distribution: "zulu"
java-version: "8"
java-version: "17"
- name: Cache Scala, SBT
uses: actions/cache@v3
with:
Expand All @@ -44,13 +44,10 @@ jobs:
sudo apt install libedit-dev
if: steps.git-diff.outputs.diff
- name: Run Delta Spark Local Publishing and Examples Compilation
# examples/scala/build.sbt will compile against the local Delta relase version (e.g. 3.2.0-SNAPSHOT).
# examples/scala/build.sbt will compile against the local Delta release version (e.g. 3.2.0-SNAPSHOT).
# Thus, we need to publishM2 first so those jars are locally accessible.
# We publish storage explicitly so that it is available for the Scala 2.13 build. As a java project
# it is typically only released when publishing for Scala 2.12.
run: |
build/sbt clean
build/sbt storage/publishM2
build/sbt "++ $SCALA_VERSION publishM2"
build/sbt sparkGroup/publishM2
cd examples/scala && build/sbt "++ $SCALA_VERSION compile"
if: steps.git-diff.outputs.diff
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: "Delta Spark Python"
name: "Delta Spark Master Python"
on: [push, pull_request]
jobs:
test:
Expand All @@ -7,7 +7,7 @@ jobs:
strategy:
matrix:
# These Scala versions must match those in the build.sbt
scala: [2.12.18]
scala: [2.13.13]
env:
SCALA_VERSION: ${{ matrix.scala }}
steps:
Expand All @@ -24,20 +24,22 @@ jobs:
uses: actions/setup-java@v3
with:
distribution: "zulu"
java-version: "8"
java-version: "17"
- name: Cache Scala, SBT
uses: actions/cache@v3
with:
path: |
~/.sbt
~/.ivy2
~/.cache/coursier
!~/.cache/coursier/v1/https/repository.apache.org/content/groups/snapshots
# Change the key if dependencies are changed. For each key, GitHub Actions will cache the
# the above directories when we use the key for the first time. After that, each run will
# just use the cache. The cache is immutable so we need to use a new key when trying to
# cache new stuff.
key: delta-sbt-cache-spark3.2-scala${{ matrix.scala }}
key: delta-sbt-cache-spark-master-scala${{ matrix.scala }}
- name: Install Job dependencies
# TODO: update pyspark installation once Spark preview is formally released
run: |
sudo apt-get update
sudo apt-get install -y make build-essential libssl-dev zlib1g-dev libbz2-dev libreadline-dev libsqlite3-dev wget curl llvm libncurses5-dev libncursesw5-dev xz-utils tk-dev libffi-dev liblzma-dev python3-openssl git
Expand All @@ -52,29 +54,28 @@ jobs:
export PATH="~/.pyenv/bin:$PATH"
eval "$(pyenv init -)"
eval "$(pyenv virtualenv-init -)"
pyenv install 3.8.18
pyenv global system 3.8.18
pipenv --python 3.8 install
pyenv install 3.9
pyenv global system 3.9
pipenv --python 3.9 install
# Update the pip version to 24.0. By default `pyenv.run` installs the latest pip version
# available. From version 24.1, `pip` doesn't allow installing python packages
# with version string containing `-`. In Delta-Spark case, the pypi package generated has
# `-SNAPSHOT` in version (e.g. `3.3.0-SNAPSHOT`) as the version is picked up from
# the`version.sbt` file.
pipenv run pip install pip==24.0 setuptools==69.5.1 wheel==0.43.0
pipenv run pip install pyspark==3.5.3
pipenv run pip install flake8==3.5.0 pypandoc==1.3.3
pipenv run pip install flake8==3.9.0
pipenv run pip install black==23.9.1
pipenv run pip install importlib_metadata==3.10.0
pipenv run pip install mypy==0.982
pipenv run pip install mypy==1.8.0
pipenv run pip install mypy-protobuf==3.3.0
pipenv run pip install cryptography==37.0.4
pipenv run pip install twine==4.0.1
pipenv run pip install wheel==0.33.4
pipenv run pip install setuptools==41.1.0
pipenv run pip install pydocstyle==3.0.0
pipenv run pip install pandas==1.1.3
pipenv run pip install pandas==1.4.4
pipenv run pip install pyarrow==8.0.0
pipenv run pip install numpy==1.20.3
pipenv run pip install numpy==1.21
pipenv run pip install https://dist.apache.org/repos/dist/dev/spark/v4.0.0-rc4-bin//pyspark-4.0.0.tar.gz
if: steps.git-diff.outputs.diff
- name: Run Python tests
# when changing TEST_PARALLELISM_COUNT make sure to also change it in spark_master_test.yaml
Expand Down
Original file line number Diff line number Diff line change
@@ -1,18 +1,17 @@
name: "Delta Iceberg Latest"
name: "Delta Sharing Spark Master"
on: [push, pull_request]
jobs:
test:
name: "DIL: Scala ${{ matrix.scala }}"
name: "Delta Sharing Spark Master"
runs-on: ubuntu-24.04
strategy:
matrix:
# These Scala versions must match those in the build.sbt
scala: [2.12.18, 2.13.13]
scala: [2.13.13]
env:
SCALA_VERSION: ${{ matrix.scala }}
steps:
- uses: actions/checkout@v3
# TODO we can make this more selective
- uses: technote-space/get-diff-action@v4
id: git-diff
with:
Expand All @@ -25,40 +24,29 @@ jobs:
uses: actions/setup-java@v3
with:
distribution: "zulu"
java-version: "8"
java-version: "17"
- name: Cache Scala, SBT
uses: actions/cache@v3
with:
path: |
~/.sbt
~/.ivy2
~/.cache/coursier
!~/.cache/coursier/v1/https/repository.apache.org/content/groups/snapshots
# Change the key if dependencies are changed. For each key, GitHub Actions will cache the
# the above directories when we use the key for the first time. After that, each run will
# just use the cache. The cache is immutable so we need to use a new key when trying to
# cache new stuff.
key: delta-sbt-cache-spark3.2-scala${{ matrix.scala }}
key: delta-sbt-cache-spark-master-scala${{ matrix.scala }}
- name: Install Job dependencies
# TODO: update pyspark installation once Spark preview is formally released
run: |
sudo apt-get update
sudo apt-get install -y make build-essential libssl-dev zlib1g-dev libbz2-dev libreadline-dev libsqlite3-dev wget curl llvm libncurses5-dev libncursesw5-dev xz-utils tk-dev libffi-dev liblzma-dev python3-openssl git
sudo apt install libedit-dev
curl -LO https://github.com/bufbuild/buf/releases/download/v1.28.1/buf-Linux-x86_64.tar.gz
mkdir -p ~/buf
tar -xvzf buf-Linux-x86_64.tar.gz -C ~/buf --strip-components 1
rm buf-Linux-x86_64.tar.gz
sudo apt install python3-pip --fix-missing
sudo pip3 install pipenv==2024.4.1
curl https://pyenv.run | bash
export PATH="~/.pyenv/bin:$PATH"
eval "$(pyenv init -)"
eval "$(pyenv virtualenv-init -)"
pyenv install 3.8.18
pyenv global system 3.8.18
pipenv --python 3.8.18 install
if: steps.git-diff.outputs.diff
- name: Run Scala/Java and Python tests
# when changing TEST_PARALLELISM_COUNT make sure to also change it in spark_master_test.yaml
- name: Run Delta Sharing tests
# NOTE: in this branch, the default sparkVersion is the SPARK_MASTER_VERSION
run: |
TEST_PARALLELISM_COUNT=4 pipenv run python run-tests.py --group iceberg
TEST_PARALLELISM_COUNT=4 build/sbt "++ ${{ matrix.scala }}" clean sharing/test
if: steps.git-diff.outputs.diff
47 changes: 42 additions & 5 deletions .github/workflows/spark_master_test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,11 @@ jobs:
# These Scala versions must match those in the build.sbt
scala: [2.13.13]
# Important: This list of shards must be [0..NUM_SHARDS - 1]
shard: [0, 1, 2]
shard: [0, 1, 2, 3]
env:
SCALA_VERSION: ${{ matrix.scala }}
# Important: This must be the same as the length of shards in matrix
NUM_SHARDS: 3
NUM_SHARDS: 4
steps:
- uses: actions/checkout@v3
- uses: technote-space/get-diff-action@v4
Expand Down Expand Up @@ -43,15 +43,52 @@ jobs:
# cache new stuff.
key: delta-sbt-cache-spark-master-scala${{ matrix.scala }}
- name: Install Job dependencies
# TODO: update pyspark installation once Spark preview is formally released
run: |
sudo apt-get update
sudo apt-get install -y make build-essential libssl-dev zlib1g-dev libbz2-dev libreadline-dev libsqlite3-dev wget curl llvm libncurses5-dev libncursesw5-dev xz-utils tk-dev libffi-dev liblzma-dev python3-openssl git
sudo apt install libedit-dev
curl -LO https://github.com/bufbuild/buf/releases/download/v1.28.1/buf-Linux-x86_64.tar.gz
mkdir -p ~/buf
tar -xvzf buf-Linux-x86_64.tar.gz -C ~/buf --strip-components 1
rm buf-Linux-x86_64.tar.gz
sudo apt install python3-pip --fix-missing
sudo pip3 install pipenv==2024.4.1
curl https://pyenv.run | bash
export PATH="~/.pyenv/bin:$PATH"
eval "$(pyenv init -)"
eval "$(pyenv virtualenv-init -)"
pyenv install 3.9
pyenv global system 3.9
pipenv --python 3.9 install
# Update the pip version to 24.0. By default `pyenv.run` installs the latest pip version
# available. From version 24.1, `pip` doesn't allow installing python packages
# with version string containing `-`. In Delta-Spark case, the pypi package generated has
# `-SNAPSHOT` in version (e.g. `3.3.0-SNAPSHOT`) as the version is picked up from
# the`version.sbt` file.
pipenv run pip install pip==24.0 setuptools==69.5.1 wheel==0.43.0
pipenv run pip install flake8==3.9.0
pipenv run pip install black==23.9.1
pipenv run pip install mypy==1.8.0
pipenv run pip install mypy-protobuf==3.3.0
pipenv run pip install cryptography==37.0.4
pipenv run pip install twine==4.0.1
pipenv run pip install wheel==0.33.4
pipenv run pip install setuptools==41.1.0
pipenv run pip install pydocstyle==3.0.0
pipenv run pip install pandas==1.4.4
pipenv run pip install pyarrow==8.0.0
pipenv run pip install numpy==1.21
pipenv run pip install https://dist.apache.org/repos/dist/dev/spark/v4.0.0-rc4-bin/pyspark-4.0.0.tar.gz
if: steps.git-diff.outputs.diff
- name: Run Spark Master tests
# when changing TEST_PARALLELISM_COUNT make sure to also change it in spark_test.yaml
- name: Run Delta Connect tests
run: |
TEST_PARALLELISM_COUNT=4 SHARD_ID=${{matrix.shard}} build/sbt -DsparkVersion=master "++ ${{ matrix.scala }}" clean spark/test
TEST_PARALLELISM_COUNT=4 build/sbt -DsparkVersion=master "++ ${{ matrix.scala }}" clean connectServer/test
TEST_PARALLELISM_COUNT=4 build/sbt -DsparkVersion=master "++ ${{ matrix.scala }}" clean connectServer/assembly connectClient/test
if: steps.git-diff.outputs.diff
- name: Run Delta Spark master tests
# when changing TEST_PARALLELISM_COUNT make sure to also change it in spark_test.yaml
# NOTE: in this branch, the default sparkVersion is the SPARK_MASTER_VERSION
run: |
TEST_PARALLELISM_COUNT=4 pipenv run python run-tests.py --group spark --shard ${{ matrix.shard }}
if: steps.git-diff.outputs.diff
Loading
Loading