Skip to content

Commit df01b39

Browse files
authored
Merge pull request #654 from NVIDIA/branch-24.04
release 24.04 [skip ci]
2 parents e0f644d + ac4785c commit df01b39

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

44 files changed

+3893
-291
lines changed

.github/workflows/auto-merge.yml

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ name: auto-merge HEAD to BASE
1818
on:
1919
pull_request_target:
2020
branches:
21-
- branch-24.02
21+
- branch-24.04
2222
types: [closed]
2323

2424
jobs:
@@ -27,16 +27,16 @@ jobs:
2727
runs-on: ubuntu-latest
2828

2929
steps:
30-
- uses: actions/checkout@v3
30+
- uses: actions/checkout@v4
3131
with:
32-
ref: branch-24.02 # force to fetch from latest upstream instead of PR ref
32+
ref: branch-24.04 # force to fetch from latest upstream instead of PR ref
3333

3434
- name: auto-merge job
3535
uses: ./.github/workflows/auto-merge
3636
env:
3737
OWNER: NVIDIA
3838
REPO_NAME: spark-rapids-ml
39-
HEAD: branch-24.02
40-
BASE: branch-24.04
39+
HEAD: branch-24.04
40+
BASE: branch-24.06
4141
AUTOMERGE_TOKEN: ${{ secrets.AUTOMERGE_TOKEN }} # use to merge PR
4242

.github/workflows/blossom-ci.yml

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -44,8 +44,6 @@ jobs:
4444
GaryShen2008,\
4545
NvTimLiu,\
4646
YanxuanLiu,\
47-
zhanga5,\
48-
Er1cCheng,\
4947
', format('{0},', github.actor)) && github.event.comment.body == 'build'
5048
steps:
5149
- name: Check if comment is issued by authorized person

.github/workflows/gcs-benchmark.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ jobs:
3939
SERVICE_ACCOUNT: ${{ secrets.GCLOUD_SERVICE_ACCOUNT }}
4040
CLUSTER_NAME: github-spark-rapids-ml-${{github.run_number}}
4141
steps:
42-
- uses: actions/checkout@v3
42+
- uses: actions/checkout@v4
4343

4444
- name: run benchmark
4545
shell: bash

.github/workflows/signoff-check.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ jobs:
2323
signoff-check:
2424
runs-on: ubuntu-latest
2525
steps:
26-
- uses: actions/checkout@v2
26+
- uses: actions/checkout@v4
2727

2828
- name: sigoff-check job
2929
uses: ./.github/workflows/signoff-check

README.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -35,16 +35,17 @@ The following table shows the currently supported algorithms. The goal is to ex
3535
| Supported Algorithms | Python | Scala |
3636
| :--------------------- | :----: | :---: |
3737
| CrossValidator || |
38+
| DBSCAN (*) || |
3839
| KMeans || |
39-
| k-NN (*) || |
40+
| approx/exact k-NN (*) || |
4041
| LinearRegression || |
4142
| LogisticRegression || |
4243
| PCA |||
4344
| RandomForestClassifier || |
4445
| RandomForestRegressor || |
4546
| UMAP (*) || |
4647

47-
Note: Spark does not provide a k-Nearest Neighbors (k-NN) implementation, but it does have an [LSH-based Approximate Nearest Neighbor](https://spark.apache.org/docs/latest/ml-features.html#approximate-nearest-neighbor-search) implementation. As an alternative to PCA, we also provide a Spark API for GPU accelerated Uniform Manifold Approximation and Projection (UMAP), a non-linear dimensionality reduction algorithm in the RAPIDS cuML library.
48+
Note: Spark does not provide a k-Nearest Neighbors (k-NN) implementation, but it does have an [LSH-based Approximate Nearest Neighbor](https://spark.apache.org/docs/latest/ml-features.html#approximate-nearest-neighbor-search) implementation. As an alternative to PCA, we also provide a Spark API for GPU accelerated Uniform Manifold Approximation and Projection (UMAP), a non-linear dimensionality reduction algorithm in the RAPIDS cuML library. As an alternative to KMeans, we also provide a Spark API for GPU accelerated Density-Based Spatial Clustering of Applications with Noise (DBSCAN), a density based clustering algorithm in the RAPIDS cuML library.
4849

4950
## Getting started
5051

ci/Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,6 @@ RUN wget --quiet https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86
3737
&& conda config --set solver libmamba
3838

3939
# install cuML
40-
ARG CUML_VER=24.02
40+
ARG CUML_VER=24.04
4141
RUN conda install -y -c rapidsai -c conda-forge -c nvidia cuml=$CUML_VER python=3.9 cuda-version=11.8 \
4242
&& conda clean --all -f -y

ci/Jenkinsfile.premerge

Lines changed: 11 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
#!/usr/local/env groovy
22
/*
3-
* Copyright (c) 2023, NVIDIA CORPORATION.
3+
* Copyright (c) 2023-2024, NVIDIA CORPORATION.
44
*
55
* Licensed under the Apache License, Version 2.0 (the "License");
66
* you may not use this file except in compliance with the License.
@@ -40,7 +40,7 @@ pipeline {
4040
agent {
4141
kubernetes {
4242
label "premerge-init-${BUILD_TAG}"
43-
cloud 'sc-ipp-blossom-prod'
43+
cloud "${common.CLOUD_NAME}"
4444
yaml cpuImage
4545
}
4646
}
@@ -87,7 +87,7 @@ pipeline {
8787

8888
def title = githubHelper.getIssue().title
8989
if (title ==~ /.*\[skip ci\].*/) {
90-
githubHelper.updateCommitStatus("$BUILD_URL", "Skipped", GitHubCommitState.SUCCESS)
90+
githubHelper.updateCommitStatus("", "Skipped", GitHubCommitState.SUCCESS)
9191
currentBuild.result == "SUCCESS"
9292
skipped = true
9393
return
@@ -107,7 +107,7 @@ pipeline {
107107
agent {
108108
kubernetes {
109109
label "premerge-docker-${BUILD_TAG}"
110-
cloud 'sc-ipp-blossom-prod'
110+
cloud "${common.CLOUD_NAME}"
111111
yaml pod.getDockerBuildYAML()
112112
workspaceVolume persistentVolumeClaimWorkspaceVolume(claimName: "${PVC}", readOnly: false)
113113
customWorkspace "${CUSTOM_WORKSPACE}"
@@ -116,7 +116,7 @@ pipeline {
116116

117117
steps {
118118
script {
119-
githubHelper.updateCommitStatus("$BUILD_URL", "Running - preparing", GitHubCommitState.PENDING)
119+
githubHelper.updateCommitStatus("", "Running - preparing", GitHubCommitState.PENDING)
120120
checkout(
121121
changelog: false,
122122
poll: true,
@@ -169,7 +169,7 @@ pipeline {
169169
agent {
170170
kubernetes {
171171
label "premerge-ci-${BUILD_TAG}"
172-
cloud 'sc-ipp-blossom-prod'
172+
cloud "${common.CLOUD_NAME}"
173173
yaml pod.getGPUYAML("${IMAGE_PREMERGE}", "${env.GPU_RESOURCE}", '8', '32Gi')
174174
workspaceVolume persistentVolumeClaimWorkspaceVolume(claimName: "${PVC}", readOnly: false)
175175
customWorkspace "${CUSTOM_WORKSPACE}"
@@ -178,7 +178,7 @@ pipeline {
178178

179179
steps {
180180
script {
181-
githubHelper.updateCommitStatus("$BUILD_URL", "Running - tests", GitHubCommitState.PENDING)
181+
githubHelper.updateCommitStatus("", "Running - tests", GitHubCommitState.PENDING)
182182
container('gpu') {
183183
timeout(time: 2, unit: 'HOURS') { // step only timeout for test run
184184
common.resolveIncompatibleDriverIssue(this)
@@ -198,14 +198,15 @@ pipeline {
198198
}
199199

200200
if (currentBuild.currentResult == "SUCCESS") {
201-
githubHelper.updateCommitStatus("$BUILD_URL", "Success", GitHubCommitState.SUCCESS)
201+
githubHelper.updateCommitStatus("", "Success", GitHubCommitState.SUCCESS)
202202
} else {
203203
// upload log only in case of build failure
204-
def guardWords = ["gitlab.*?\\.com", "urm.*?\\.com"]
204+
def guardWords = ["gitlab.*?\\.com", "urm.*?\\.com", "sc-ipp-*"]
205205
guardWords.add("nvidia-smi(?s)(.*?)(?=git)") // hide GPU info
206+
guardWords.add("sc-ipp*") // hide cloud info
206207
githubHelper.uploadLogs(this, env.JOB_NAME, env.BUILD_NUMBER, null, guardWords)
207208

208-
githubHelper.updateCommitStatus("$BUILD_URL", "Fail", GitHubCommitState.FAILURE)
209+
githubHelper.updateCommitStatus("", "Fail", GitHubCommitState.FAILURE)
209210
}
210211

211212
if (TEMP_IMAGE_BUILD) {

docker/Dockerfile.pip

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ ARG CUDA_VERSION=11.8.0
1818
FROM nvidia/cuda:${CUDA_VERSION}-devel-ubuntu22.04
1919

2020
ARG PYSPARK_VERSION=3.3.1
21-
ARG RAPIDS_VERSION=24.2.0
21+
ARG RAPIDS_VERSION=24.4.0
2222
ARG ARCH=amd64
2323
#ARG ARCH=arm64
2424
# Install packages to build spark-rapids-ml

docker/Dockerfile.python

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717
ARG CUDA_VERSION=11.8.0
1818
FROM nvidia/cuda:${CUDA_VERSION}-devel-ubuntu20.04
1919

20-
ARG CUML_VERSION=24.02
20+
ARG CUML_VERSION=24.04
2121

2222
# Install packages to build spark-rapids-ml
2323
RUN apt update -y \

docs/site/compatibility.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,16 +11,17 @@ The following table shows the currently supported algorithms. The goal is to ex
1111
| Supported Algorithms | Python | Scala |
1212
| :--------------------- | :----: | :---: |
1313
| CrossValidator || |
14+
| DBSCAN (*) || |
1415
| KMeans || |
15-
| k-NN (*) || |
16+
| approx/exact k-NN (*) || |
1617
| LinearRegression || |
1718
| LogisticRegression || |
1819
| PCA |||
1920
| RandomForestClassifier || |
2021
| RandomForestRegressor || |
2122
| UMAP (*) || |
2223

23-
Note: Spark does not provide a k-Nearest Neighbors (k-NN) implementation, but it does have an [LSH-based Approximate Nearest Neighbor](https://spark.apache.org/docs/latest/ml-features.html#approximate-nearest-neighbor-search) implementation. As an alternative to PCA, we also provide a Spark API for GPU accelerated Uniform Manifold Approximation and Projection (UMAP), a non-linear dimensionality reduction algorithm in the RAPIDS cuML library.
24+
Note: Spark does not provide a k-Nearest Neighbors (k-NN) implementation, but it does have an [LSH-based Approximate Nearest Neighbor](https://spark.apache.org/docs/latest/ml-features.html#approximate-nearest-neighbor-search) implementation. As an alternative to PCA, we also provide a Spark API for GPU accelerated Uniform Manifold Approximation and Projection (UMAP), a non-linear dimensionality reduction algorithm in the RAPIDS cuML library. As an alternative to KMeans, we also provide a Spark API for GPU accelerated Density-Based Spatial Clustering of Applications with Noise (DBSCAN), a density based clustering algorithm in the RAPIDS cuML library.
2425

2526

2627
## Supported Versions

0 commit comments

Comments
 (0)