Skip to content

Commit ec369a1

Browse files
pxLiYanxuanLiunvautorevans2nvliyuan
authored
Merge branch-25.02 to main [skip ci] (#12219)
to include latest doc updates NOTE: this must be merged as `create a merge commit` --------- Signed-off-by: Yanxuan Liu <yanxuanl@nvidia.com> Signed-off-by: nvauto <70000568+nvauto@users.noreply.github.com> Signed-off-by: Peixin Li <pxLi@nyu.edu> Signed-off-by: Robert (Bobby) Evans <bobby@apache.org> Signed-off-by: liyuan <yuali@nvidia.com> Co-authored-by: YanxuanLiu <yanxuanl@nvidia.com> Co-authored-by: Jenkins Automation <70000568+nvauto@users.noreply.github.com> Co-authored-by: Robert (Bobby) Evans <bobby@apache.org> Co-authored-by: liyuan <84758614+nvliyuan@users.noreply.github.com>
2 parents ea4455e + 0aec1e0 commit ec369a1

File tree

8 files changed

+42
-19
lines changed

8 files changed

+42
-19
lines changed

.github/workflows/mvn-verify-check/get-deps-sha1.sh

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,9 +20,11 @@ scala_ver=${1:-"2.12"}
2020
base_URL="https://oss.sonatype.org/service/local/artifact/maven/resolve"
2121
project_jni="spark-rapids-jni"
2222
project_private="rapids-4-spark-private_${scala_ver}"
23+
project_hybrid="rapids-4-spark-hybrid_${scala_ver}"
2324

2425
jni_ver=$(mvn help:evaluate -q -pl dist -Dexpression=spark-rapids-jni.version -DforceStdout)
2526
private_ver=$(mvn help:evaluate -q -pl dist -Dexpression=spark-rapids-private.version -DforceStdout)
27+
hybrid_ver=$(mvn help:evaluate -q -pl dist -Dexpression=spark-rapids-hybrid.version -DforceStdout)
2628

2729
if [[ $jni_ver == *SNAPSHOT* ]]; then
2830
jni_sha1=$(curl -s -H "Accept: application/json" \
@@ -40,6 +42,14 @@ else
4042
private_sha1=$private_ver
4143
fi
4244

43-
sha1md5=$(echo -n "${jni_sha1}_${private_sha1}" | md5sum | awk '{print $1}')
45+
if [[ $hybrid_ver == *SNAPSHOT* ]]; then
46+
hybrid_sha1=$(curl -s -H "Accept: application/json" \
47+
"${base_URL}?r=snapshots&g=com.nvidia&a=${project_hybrid}&v=${hybrid_ver}&c=&e=jar&wt=json" \
48+
| jq .data.sha1) || $(date +'%Y-%m-%d')
49+
else
50+
hybrid_sha1=$hybrid_ver
51+
fi
52+
53+
sha1md5=$(echo -n "${jni_sha1}_${private_sha1}_${hybrid_sha1}" | md5sum | awk '{print $1}')
4454

4555
echo $sha1md5

docs/dev/hybrid-execution.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -53,5 +53,5 @@ the Rapids hybrid jar) in the classpath by specifying:
5353
## Limitations
5454
- Only supports V1 Parquet data source.
5555
- Only supports Scala 2.12, do not support Scala 2.13.
56-
- Support Spark 3.2.2, 3.3.1, 3.4.2, and 3.5.1 like [Gluten supports](https://github.com/apache/incubator-gluten/releases/tag/v1.2.0),
57-
other Spark versions 32x, 33x, 34x, 35x also work, but are not fully tested.
56+
- Support Spark 3.2.2, 3.3.1, 3.4.2, and 3.5.1, matching [Gluten](https://github.com/apache/incubator-gluten/releases/tag/v1.2.0).
57+
Other Spark versions 32x, 33x, 34x, 35x may work, but are not fully tested.

docs/download.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -94,12 +94,12 @@ The output of signature verify:
9494
gpg: Good signature from "NVIDIA Spark (For the signature of spark-rapids release jars) <sw-spark@nvidia.com>"
9595

9696
### Release Notes
97-
* Support Spark function Bin
98-
* Improve spark metrics: Print the batch size information to executor log
97+
* Support the Spark functions Bin and TruncDate
98+
* Support group-limit optimization for ROW_NUMBER
99+
* Improve Spark metrics: Print the batch size information to executor log
99100
* Refine filter push down to avoid double evaluation
100101
* Grab the GPU Semaphore when reading cached batch data with the GPU to avoid a GPU OOM case
101102
* Add an option to disable measuring buffer copy to improve large shuffle large partition serialization
102-
* Support group-limit optimization for ROW_NUMBER
103103
* For updates on RAPIDS Accelerator Tools, please visit [this link](https://github.com/NVIDIA/spark-rapids-tools/releases)
104104

105105
Note: There is a known issue in the 25.02.0 release when decompressing gzip files on H100 GPUs.

pom.xml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -840,7 +840,7 @@
840840
<spark.version.classifier>spark${buildver}</spark.version.classifier>
841841
<cuda.version>cuda11</cuda.version>
842842
<jni.classifier>${cuda.version}</jni.classifier>
843-
<spark-rapids-jni.version>25.02.0</spark-rapids-jni.version>
843+
<spark-rapids-jni.version>25.02.1-SNAPSHOT</spark-rapids-jni.version>
844844
<spark-rapids-private.version>25.02.0</spark-rapids-private.version>
845845
<spark-rapids-hybrid.version>25.02.0</spark-rapids-hybrid.version>
846846
<scala.binary.version>2.12</scala.binary.version>

scala2.13/pom.xml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -840,7 +840,7 @@
840840
<spark.version.classifier>spark${buildver}</spark.version.classifier>
841841
<cuda.version>cuda11</cuda.version>
842842
<jni.classifier>${cuda.version}</jni.classifier>
843-
<spark-rapids-jni.version>25.02.0</spark-rapids-jni.version>
843+
<spark-rapids-jni.version>25.02.1-SNAPSHOT</spark-rapids-jni.version>
844844
<spark-rapids-private.version>25.02.0</spark-rapids-private.version>
845845
<spark-rapids-hybrid.version>25.02.0</spark-rapids-hybrid.version>
846846
<scala.binary.version>2.13</scala.binary.version>

sql-plugin/src/main/scala/com/nvidia/spark/rapids/RapidsConf.scala

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2302,7 +2302,7 @@ val SHUFFLE_COMPRESSION_LZ4_CHUNK_SIZE = conf("spark.rapids.shuffle.compression.
23022302
val CHUNKED_PACK_BOUNCE_BUFFER_COUNT = conf("spark.rapids.sql.chunkedPack.bounceBuffers")
23032303
.doc("Number of chunked pack bounce buffers, needed during spill from GPU to host memory. ")
23042304
.internal()
2305-
.longConf
2305+
.integerConf
23062306
.checkValue(v => v >= 1,
23072307
"The chunked pack bounce buffer count must be at least 1")
23082308
.createWithDefault(4)
@@ -2321,7 +2321,7 @@ val SHUFFLE_COMPRESSION_LZ4_CHUNK_SIZE = conf("spark.rapids.shuffle.compression.
23212321
conf("spark.rapids.memory.host.spillToDiskBounceBuffers")
23222322
.doc("Number of bounce buffers used for gpu to disk spill that bypasses the host store.")
23232323
.internal()
2324-
.longConf
2324+
.integerConf
23252325
.checkValue(v => v >= 1,
23262326
"The gpu to disk spill bounce buffer count must be positive")
23272327
.createWithDefault(4)
@@ -3273,11 +3273,11 @@ class RapidsConf(conf: Map[String, String]) extends Logging {
32733273

32743274
lazy val chunkedPackBounceBufferSize: Long = get(CHUNKED_PACK_BOUNCE_BUFFER_SIZE)
32753275

3276-
lazy val chunkedPackBounceBufferCount: Long = get(CHUNKED_PACK_BOUNCE_BUFFER_COUNT)
3276+
lazy val chunkedPackBounceBufferCount: Int = get(CHUNKED_PACK_BOUNCE_BUFFER_COUNT)
32773277

32783278
lazy val spillToDiskBounceBufferSize: Long = get(SPILL_TO_DISK_BOUNCE_BUFFER_SIZE)
32793279

3280-
lazy val spillToDiskBounceBufferCount: Long = get(SPILL_TO_DISK_BOUNCE_BUFFER_COUNT)
3280+
lazy val spillToDiskBounceBufferCount: Int = get(SPILL_TO_DISK_BOUNCE_BUFFER_COUNT)
32813281

32823282
lazy val splitUntilSizeOverride: Option[Long] = get(SPLIT_UNTIL_SIZE_OVERRIDE)
32833283

sql-plugin/src/main/scala/com/nvidia/spark/rapids/spill/SpillFramework.scala

Lines changed: 16 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ import java.nio.channels.{Channels, FileChannel, WritableByteChannel}
2222
import java.nio.file.StandardOpenOption
2323
import java.util
2424
import java.util.UUID
25-
import java.util.concurrent.{ConcurrentHashMap, LinkedBlockingQueue}
25+
import java.util.concurrent.{ArrayBlockingQueue, ConcurrentHashMap}
2626

2727
import scala.collection.mutable
2828

@@ -1805,21 +1805,28 @@ private[spill] class BounceBuffer[T <: AutoCloseable](
18051805
* Callers should synchronize before calling close on their `DeviceMemoryBuffer`s.
18061806
*/
18071807
class BounceBufferPool[T <: AutoCloseable](private val bufSize: Long,
1808-
private val bbCount: Long,
1808+
private val bbCount: Int,
18091809
private val allocator: Long => T)
18101810
extends AutoCloseable with Logging {
18111811

1812-
private val pool = new LinkedBlockingQueue[BounceBuffer[T]]
1813-
for (_ <- 1L to bbCount) {
1812+
private val pool = new ArrayBlockingQueue[BounceBuffer[T]](bbCount)
1813+
for (_ <- 1 to bbCount) {
18141814
pool.offer(new BounceBuffer[T](allocator(bufSize), this))
18151815
}
18161816

18171817
def bufferSize: Long = bufSize
18181818
def nextBuffer(): BounceBuffer[T] = synchronized {
18191819
if (closed) {
1820-
logError("tried to acquire a bounce buffer after the" +
1820+
throw new IllegalStateException("tried to acquire a bounce buffer after the" +
18211821
"pool has been closed!")
18221822
}
1823+
while (pool.size() <= 0) {
1824+
wait()
1825+
if (closed) {
1826+
throw new IllegalStateException("tried to acquire a bounce buffer after the" +
1827+
"pool has been closed!")
1828+
}
1829+
}
18231830
pool.take()
18241831
}
18251832

@@ -1828,6 +1835,8 @@ class BounceBufferPool[T <: AutoCloseable](private val bufSize: Long,
18281835
buffer.release()
18291836
} else {
18301837
pool.offer(buffer)
1838+
// Wake up one thread to take the next bounce buffer
1839+
notify()
18311840
}
18321841
}
18331842

@@ -1842,6 +1851,8 @@ class BounceBufferPool[T <: AutoCloseable](private val bufSize: Long,
18421851

18431852
pool.forEach(_.release())
18441853
pool.clear()
1854+
// Wake up any threads that might be waiting still...
1855+
notifyAll()
18451856
}
18461857
}
18471858
}

sql-plugin/src/test/scala/com/nvidia/spark/rapids/HostAllocSuite.scala

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
/*
2-
* Copyright (c) 2023-2024, NVIDIA CORPORATION.
2+
* Copyright (c) 2023-2025, NVIDIA CORPORATION.
33
*
44
* Licensed under the Apache License, Version 2.0 (the "License");
55
* you may not use this file except in compliance with the License.
@@ -23,7 +23,7 @@ import com.nvidia.spark.rapids.Arm.{closeOnExcept, withResource}
2323
import com.nvidia.spark.rapids.jni.{RmmSpark, RmmSparkThreadState}
2424
import com.nvidia.spark.rapids.spill._
2525
import org.mockito.Mockito.when
26-
import org.scalatest.{BeforeAndAfterAll, BeforeAndAfterEach}
26+
import org.scalatest.{BeforeAndAfterAll, BeforeAndAfterEach, Ignore}
2727
import org.scalatest.concurrent.{Signaler, TimeLimits}
2828
import org.scalatest.funsuite.AnyFunSuite
2929
import org.scalatest.time._
@@ -34,6 +34,8 @@ import org.apache.spark.sql.SparkSession
3434
import org.apache.spark.sql.internal.SQLConf
3535
import org.apache.spark.sql.rapids.execution.TrampolineUtil
3636

37+
// Waiting for the fix of https://github.com/NVIDIA/spark-rapids/issues/12194
38+
@Ignore
3739
class HostAllocSuite extends AnyFunSuite with BeforeAndAfterEach with
3840
BeforeAndAfterAll with TimeLimits {
3941
private val sqlConf = new SQLConf()

0 commit comments

Comments
 (0)