NVIDIA
diff --git a/‎README.md‎
Lines changed: 19 additions & 27 deletions b/‎README.md‎
Lines changed: 19 additions & 27 deletions
diff --git a/‎ci/Dockerfile‎
Lines changed: 3 additions & 2 deletions b/‎ci/Dockerfile‎
Lines changed: 3 additions & 2 deletions
diff --git a/‎ci/Jenkinsfile.premerge‎
Lines changed: 2 additions & 2 deletions b/‎ci/Jenkinsfile.premerge‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎ci/lint_python.py‎
Lines changed: 1 addition & 1 deletion b/‎ci/lint_python.py‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎deprecated/README.md‎
Lines changed: 123 additions & 0 deletions b/‎deprecated/README.md‎
Lines changed: 123 additions & 0 deletions
diff --git a/‎jvm/native/CMakeLists.txt‎ renamed to ‎deprecated/native/CMakeLists.txt‎ b/‎jvm/native/CMakeLists.txt‎ renamed to ‎deprecated/native/CMakeLists.txt‎
diff --git a/‎jvm/native/src/CMakeLists.txt‎ renamed to ‎deprecated/native/src/CMakeLists.txt‎ b/‎jvm/native/src/CMakeLists.txt‎ renamed to ‎deprecated/native/src/CMakeLists.txt‎
diff --git a/‎jvm/native/src/rapidsml_jni.cpp‎ renamed to ‎deprecated/native/src/rapidsml_jni.cpp‎ b/‎jvm/native/src/rapidsml_jni.cpp‎ renamed to ‎deprecated/native/src/rapidsml_jni.cpp‎
diff --git a/‎jvm/native/src/rapidsml_jni.cu‎ renamed to ‎deprecated/native/src/rapidsml_jni.cu‎ b/‎jvm/native/src/rapidsml_jni.cu‎ renamed to ‎deprecated/native/src/rapidsml_jni.cu‎
diff --git a/‎jvm/native/src/rapidsml_jni.hpp‎ renamed to ‎deprecated/native/src/rapidsml_jni.hpp‎ b/‎jvm/native/src/rapidsml_jni.hpp‎ renamed to ‎deprecated/native/src/rapidsml_jni.hpp‎
@@ -1,6 +1,6 @@
 # Spark Rapids ML
 
-Spark Rapids ML enables GPU accelerated distributed machine learning on [Apache Spark](https://spark.apache.org/).  It provides several PySpark ML compatible algorithms powered by the [RAPIDS cuML](https://docs.rapids.ai/api/cuml/stable/) library, along with a compatible Scala API for the PCA algorithm.
+Spark Rapids ML enables GPU accelerated distributed machine learning on [Apache Spark](https://spark.apache.org/).  It provides several PySpark ML compatible algorithms powered by the [RAPIDS cuML](https://docs.rapids.ai/api/cuml/stable/) library.
 
 These APIs seek to minimize any code changes to end user Spark code.  After your environment is configured to support GPUs (with drivers, CUDA toolkit, and RAPIDS dependencies), you should be able to just change an import statement or class name to take advantage of GPU acceleration.   See [here](./python/README.md#clis-enabling-no-package-import-change) for experimental CLIs that enable GPU acceleration without the need for changing the `pyspark.ml` package names in an existing pyspark ml application.
 
@@ -18,39 +18,31 @@ pca = (
 pca.fit(df)
 ```
 
-**Scala**
-```scala
-// val pca = new org.apache.spark.ml.feature.PCA()
-val pca = new com.nvidia.spark.ml.feature.PCA()
-  .setK(3)
-  .setInputCol("features")
-  .setOutputCol("pca_features")
-  .fit(df)
-```
-
 ## Supported Algorithms
 
 The following table shows the currently supported algorithms.  The goal is to expand this over time with support from the underlying RAPIDS cuML libraries.  If you would like support for a specific algorithm, please file a [git issue](https://github.com/NVIDIA/spark-rapids-ml/issues) to help us prioritize.
 
-| Supported Algorithms   | Python | Scala |
-| :--------------------- | :----: | :---: |
-| CrossValidator         |   √    |       |
-| DBSCAN (*)             |   √    |       |
-| KMeans                 |   √    |       |
-| approx/exact k-NN (*)  |   √    |       |
-| LinearRegression       |   √    |       |
-| LogisticRegression     |   √    |       | 
-| PCA                    |   √    |   √   |
-| RandomForestClassifier |   √    |       |
-| RandomForestRegressor  |   √    |       |
-| UMAP (*)               |   √    |       |
-
-Note: Spark does not provide a k-Nearest Neighbors (k-NN) implementation, but it does have an [LSH-based Approximate Nearest Neighbor](https://spark.apache.org/docs/latest/ml-features.html#approximate-nearest-neighbor-search) implementation.   As an alternative to PCA, we also provide a Spark API for GPU accelerated Uniform Manifold Approximation and Projection (UMAP), a non-linear dimensionality reduction algorithm in the RAPIDS cuML library.   As an alternative to KMeans, we also provide a Spark API for GPU accelerated Density-Based Spatial Clustering of Applications with Noise (DBSCAN), a density based clustering algorithm in the RAPIDS cuML library.
+| Supported Algorithms   | Python |
+| :--------------------- | :----: |
+| CrossValidator         |   √    |
+| DBSCAN (*)             |   √    |
+| KMeans                 |   √    |
+| approx/exact k-NN (*)  |   √    |
+| LinearRegression       |   √    |
+| LogisticRegression     |   √    |
+| PCA                    |   √    |
+| RandomForestClassifier |   √    |
+| RandomForestRegressor  |   √    |
+| UMAP (*)               |   √    |
+
+(*) Notes: 
+- As an alternative to KMeans, we also provide a Spark API for GPU accelerated Density-Based Spatial Clustering of Applications with Noise (DBSCAN), a density based clustering algorithm in the RAPIDS cuML library.
+- Spark does not provide a k-Nearest Neighbors (k-NN) implementation, but it does have an [LSH-based Approximate Nearest Neighbor](https://spark.apache.org/docs/latest/ml-features.html#approximate-nearest-neighbor-search) implementation. 
+- As an alternative to PCA, we also provide a Spark API for GPU accelerated Uniform Manifold Approximation and Projection (UMAP), a non-linear dimensionality reduction algorithm in the RAPIDS cuML library. 
 
 ## Getting started
 
-- For PySpark (Python) users, see [this guide](python/README.md).
-- For Spark (Scala) users, see [this guide](jvm/README.md).
+For PySpark (Python) users, see [this guide](python/README.md).
 
 ## Performance
 
 
@@ -15,9 +15,10 @@
 #
 
 ARG CUDA_VERSION=11.8.0
-FROM nvidia/cuda:${CUDA_VERSION}-devel-ubuntu20.04
+FROM nvidia/cuda:${CUDA_VERSION}-devel-ubuntu22.04
 
 # Install packages to build spark-rapids-ml
+RUN chmod 1777 /tmp
 RUN apt update -y \
     && DEBIAN_FRONTEND=noninteractive TZ=Etc/UTC apt install -y openjdk-8-jdk \
     && apt install -y git numactl software-properties-common wget zip \
@@ -37,6 +38,6 @@ RUN wget --quiet https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86
     && conda config --set solver libmamba
 
 # install cuML
-ARG CUML_VER=25.02
+ARG CUML_VER=25.04
 RUN conda install -y -c rapidsai -c conda-forge -c nvidia cuml=$CUML_VER cuvs=$CUML_VER python=3.10 cuda-version=11.8 numpy~=1.0 \
     && conda clean --all -f -y
@@ -1,6 +1,6 @@
 #!/usr/local/env groovy
 /*
- * Copyright (c) 2023-2024, NVIDIA CORPORATION.
+ * Copyright (c) 2023-2025, NVIDIA CORPORATION.
  *
  * Licensed under the Apache License, Version 2.0 (the "License");
  * you may not use this file except in compliance with the License.
@@ -30,7 +30,7 @@ import ipp.blossom.*
 
 def githubHelper // blossom github helper
 def TEMP_IMAGE_BUILD = true
-def IMAGE_PREMERGE = "${common.ARTIFACTORY_NAME}/sw-spark-docker/spark-rapids-ml:ubuntu20.04-blossom-ci"
+def IMAGE_PREMERGE = "${common.ARTIFACTORY_NAME}/sw-spark-docker/rapids:ml-ubuntu22-cuda11.8.0-py310"
 def cpuImage = pod.getCPUYAML("${common.ARTIFACTORY_NAME}/sw-spark-docker/spark:rapids-databricks") // tooling image
 def PREMERGE_DOCKERFILE = 'ci/Dockerfile'
 def PREMERGE_TAG
 
@@ -1,4 +1,4 @@
-# Copyright (c) 2024, NVIDIA CORPORATION.
+# Copyright (c) 2024-2025, NVIDIA CORPORATION.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 
@@ -0,0 +1,123 @@
+# Spark Rapids ML (Scala)
+
+**NOTE**: The Scala algorithm is deprecated as of v25.04. 
+
+### PCA
+
+Comparing to the original PCA training API:
+
+```scala
+val pca = new org.apache.spark.ml.feature.PCA()
+  .setInputCol("feature_vector_type")
+  .setOutputCol("feature_value_3d")
+  .setK(3)
+  .fit(vectorDf)
+```
+
+We used a customized class and user will need to do `no code change` to enjoy the GPU acceleration:
+
+```scala
+val pca = new com.nvidia.spark.ml.feature.PCA()
+  .setInputCol("feature_array_type") // accept ArrayType column, no need to convert it to Vector type
+  .setOutputCol("feature_value_3d")
+  .setK(3)
+  .fit(vectorDf)
+...
+```
+
+Note: The `setInputCol` is targeting the input column of `Vector` type for training process in `CPU`
+version. But in GPU version, user doesn't need to do the extra preprocess step to convert column of
+`ArrayType` to `Vector` type, the `setInputCol` will accept the raw `ArrayType` column.
+
+## Build
+
+### Build in Docker:
+
+We provide a Dockerfile to build the project in a container. See [docker](../docker/README.md) for more instructions.
+
+### Prerequisites:
+
+1. essential build tools:
+    - [cmake(>=3.23.1)](https://cmake.org/download/),
+    - [ninja(>=1.10)](https://github.com/ninja-build/ninja/releases),
+    - [gcc(>=9.3)](https://gcc.gnu.org/releases.html)
+2. [CUDA Toolkit(>=11.5)](https://developer.nvidia.com/cuda-toolkit)
+3. conda: use [miniconda](https://docs.conda.io/en/latest/miniconda.html) to maintain header files
+and cmake dependecies
+4. [cuDF](https://github.com/rapidsai/cudf):
+    - install cuDF shared library via conda:
+      ```bash
+      conda install -c rapidsai -c conda-forge cudf=22.04 python=3.8 -y
+      ```
+5. [RAFT(22.12)](https://github.com/rapidsai/raft):
+    - raft provides only header files, so no build instructions for it. Note we fix the version to
+      22.12 to avoid potential API compatibility issues in the future.
+      ```bash
+      $ git clone -b branch-22.12 https://github.com/rapidsai/raft.git
+      ```
+6. export RAFT_PATH:
+    ```bash
+    export RAFT_PATH=ABSOLUTE_PATH_TO_YOUR_RAFT_FOLDER
+    ```
+Note: For those using other types of GPUs which do not have CUDA forward compatibility (for example, GeForce), CUDA 11.5 or later is required.
+
+### Build target jar
+
+Spark-rapids-ml uses [spark-rapids](https://github.com/NVIDIA/spark-rapids) plugin as a dependency.
+To build the _SNAPSHOT_ jar, user needs to build and install the denpendency jar _rapids-4-spark_ first
+because there's no snapshot jar for spark-rapids plugin in public maven repositories.
+See [build instructions](https://github.com/NVIDIA/spark-rapids/blob/branch-23.04/CONTRIBUTING.md#building-a-distribution-for-multiple-versions-of-spark) to get the dependency jar installed.
+
+User can also modify the pom file to use the _release_ version spark-rapids plugin as the dependency. In this case user doesn't need to manually build and install spark-rapids plugin jar by themselves.
+
+Make sure the _rapids-4-spark_ is installed in your local maven then user can build it directly in
+the _project root path_ with:
+```
+cd jvm
+mvn clean package
+```
+Then `rapids-4-spark-ml_2.12-24.04.1-SNAPSHOT.jar` will be generated under `target` folder.
+
+Users can also use the _release_ version spark-rapids plugin as the dependency if it's already been
+released in public maven repositories, see [rapids-4-spark maven repository](https://mvnrepository.com/artifact/com.nvidia/rapids-4-spark)
+for release versions. In this case, users don't need to manually build and install spark-rapids
+plugin jar by themselves. Remember to replace the [dependency](https://github.com/NVIDIA/spark-rapids-ml/blob/branch-23.04/pom.xml#L94-L96)
+in pom file.
+
+_Note_: This module contains both native and Java/Scala code. The native library build instructions
+has been added to the pom.xml file so that maven build command will help build native library all
+the way. Make sure the prerequisites are all met, or the build will fail with error messages
+accordingly such as "cmake not found" or "ninja not found" etc.
+
+## How to use
+
+After the building processes, spark-rapids plugin jar will be installed to your local maven
+repository, usually in your `~/.m2/repository`.
+
+Add the artifact jar to the Spark, for example:
+```bash
+ML_JAR="target/rapids-4-spark-ml_2.12-24.04.1-SNAPSHOT.jar"
+PLUGIN_JAR="~/.m2/repository/com/nvidia/rapids-4-spark_2.12/24.04.1/rapids-4-spark_2.12-24.04.1.jar"
+
+$SPARK_HOME/bin/spark-shell --master $SPARK_MASTER \
+ --driver-memory 20G \
+ --executor-memory 30G \
+ --conf spark.driver.maxResultSize=8G \
+ --jars ${ML_JAR},${PLUGIN_JAR} \
+ --conf spark.plugins=com.nvidia.spark.SQLPlugin \
+ --conf spark.rapids.sql.enabled=true \
+ --conf spark.task.resource.gpu.amount=0.08 \
+ --conf spark.executor.resource.gpu.amount=1 \
+ --conf spark.executor.resource.gpu.discoveryScript=./getGpusResources.sh \
+ --files ${SPARK_HOME}/examples/src/main/scripts/getGpusResources.sh
+```
+
+### PCA examples
+
+Please refer to
+[PCA examples](https://github.com/NVIDIA/spark-rapids-examples/blob/branch-23.04/examples/ML+DL-Examples/Spark-cuML/pca/) for
+more details about example code. We provide both
+[Notebook](https://github.com/NVIDIA/spark-rapids-examples/blob/branch-23.04/examples/ML+DL-Examples/Spark-cuML/pca/notebooks/Spark_PCA_End_to_End.ipynb)
+and [jar](https://github.com/NVIDIA/spark-rapids-examples/blob/branch-23.04/examples/ML+DL-Examples/Spark-cuML/pca/scala/src/com/nvidia/spark/examples/pca/Main.scala)
+ versions there. Instructions to run these examples are described in the
+[README](https://github.com/NVIDIA/spark-rapids-examples/blob/branch-23.04/examples/ML+DL-Examples/Spark-cuML/pca/README.md).
Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,4 @@`
`1`		`-# Copyright (c) 2024, NVIDIA CORPORATION.`
	`1`	`+# Copyright (c) 2024-2025, NVIDIA CORPORATION.`
`2`	`2`	`#`
`3`	`3`	`# Licensed under the Apache License, Version 2.0 (the "License");`
`4`	`4`	`# you may not use this file except in compliance with the License.`