Skip to content

Add Databricks-17.3 support [databricks]#14360

Open
nartal1 wants to merge 52 commits intoNVIDIA:mainfrom
nartal1:databricks_173_support
Open

Add Databricks-17.3 support [databricks]#14360
nartal1 wants to merge 52 commits intoNVIDIA:mainfrom
nartal1:databricks_173_support

Conversation

@nartal1
Copy link
Collaborator

@nartal1 nartal1 commented Mar 3, 2026

Contributes to #14015

Description

This PR adds support for Databricks-17.3. This PR compiles DBR-17.3 without any build failures. We have a bunch of integration tests that are failing that are tracked in a different issue.
This PR:

  • Adds a new spark400db173 shim to support Databricks Runtime 17.3, which is based on Spark 4.0.0
  • Introduces build profile, shim dependencies, and version-specific source directories for DB-17.3

Key Changes

Build Infrastructure

  • New Maven profile release400db173 with spark.version=4.0.0-databricks-173
  • Updated Scala 2.13 enforcer regex to allow Databricks vendor builds (db suffix)
  • Made orc-format dependency conditional (DB-17.3 only) in the Databricks BOM — this artifact does not exist in earlier Databricks runtimes
  • Updated Jenkins CI scripts (build.sh, deploy.sh, install_deps.py, common_vars.sh) for DB-17.3 cluster support

Shuffle API Changes

  • DB-17.3 adds a prismMapStatusEnabled parameter to ShuffleManager.getReader() (8-param vs 7-param signature)
  • New GpuShuffleExchangeExec for DB-17.3 with additional metrics (skew, spill-fallback, adaptive repartitioning) and new repartition() / adaptiveRepartitioningStatus() methods
  • New RapidsShuffleReaderShim, and ShuffleManagerShims for the updated shuffle interfaces

Adaptive Query Execution

  • ShuffleQueryStageExec and BroadcastQueryStageExec constructors now require an implicit adaptiveContext parameter in DB-17.3

Expression and Subquery Changes

  • DynamicPruningExpression has a second parameter (dynamicPruningInfo) in DB-17.3
  • GpuScalarSubquery must implement resultUpdated() added to ExecSubqueryExpression
  • Expression shims updated for DB-17.3's nodePatternsInternal tree pattern matching (ShimGetArrayStructFields, ShimGetArrayItem, ShimGetStructField, GpuDeterministicFirstLastCollectShim)

Streaming and Package Reorganization

  • FileStreamSink moved to org.apache.spark.sql.execution.streaming.sinks
  • MetadataLogFileIndex moved to org.apache.spark.sql.execution.streaming.runtime

Build

Login to Databricks-17.3 ML cluster.
Checkout this branch(nartal1/databricks_173_support)
Run : ./jenkins/databricks/build.sh

Checklists

  • This PR has added documentation for new or modified features or behaviors.
  • This PR has added new tests or modified existing tests to cover new code paths.
    (Please explain in the PR description how the new code paths are tested, such as names of the new/existing tests that cover them.)
  • Performance testing has been performed and its results are added in the PR description. Or, an issue has been filed with a link in the PR description.

Signed-off-by: Niranjan Artal <nartal@nvidia.com>
- Share TryModeShim.scala (evalContext.evalMode handling)
- Share TimeAddShims.scala (TimeAdd->TimestampAddInterval rename)
- Both files moved to spark400db173 with updated metadata to support both versions
- Add spark400db173 to RoundShims and SparkStringUtilsShims metadata
- Fix ShowNamespacesExecShims API mismatch
- Share AggregateInPandasExecShims between spark400db173 and spark411
- Share FileStreamSinkShims between spark400db173 and spark411

Signed-off-by: Niranjan Artal <nartal@nvidia.com>
Signed-off-by: Niranjan Artal <nartal@nvidia.com>
Signed-off-by: Niranjan Artal <nartal@nvidia.com>
Signed-off-by: Niranjan Artal <nartal@nvidia.com>
Signed-off-by: Niranjan Artal <nartal@nvidia.com>
Refactor getReader to getReaderImpl and use ShuffleManagerShims
to handle version-specific shuffle reader signatures.

Signed-off-by: Niranjan Artal <nartal@nvidia.com>
Signed-off-by: Niranjan Artal <nartal@nvidia.com>
Signed-off-by: Niranjan Artal <nartal@nvidia.com>
Signed-off-by: Niranjan Artal <nartal@nvidia.com>
Signed-off-by: Niranjan Artal <nartal@nvidia.com>
Signed-off-by: Niranjan Artal <nartal@nvidia.com>
Signed-off-by: Niranjan Artal <nartal@nvidia.com>
Signed-off-by: Niranjan Artal <nartal@nvidia.com>
Signed-off-by: Niranjan Artal <nartal@nvidia.com>
Signed-off-by: Niranjan Artal <nartal@nvidia.com>
Signed-off-by: Niranjan Artal <nartal@nvidia.com>
Signed-off-by: Niranjan Artal <nartal@nvidia.com>
Signed-off-by: Niranjan Artal <nartal@nvidia.com>
Signed-off-by: Niranjan Artal <nartal@nvidia.com>
Signed-off-by: Niranjan Artal <nartal@nvidia.com>
Signed-off-by: Niranjan Artal <nartal@nvidia.com>
@nartal1
Copy link
Collaborator Author

nartal1 commented Mar 4, 2026

build

@nartal1
Copy link
Collaborator Author

nartal1 commented Mar 6, 2026

build

@nartal1
Copy link
Collaborator Author

nartal1 commented Mar 6, 2026

build

@nartal1
Copy link
Collaborator Author

nartal1 commented Mar 6, 2026

build

@nartal1 nartal1 marked this pull request as ready for review March 7, 2026 01:36
@nartal1 nartal1 requested review from a team and gerashegalov March 7, 2026 01:36
Signed-off-by: Niranjan Artal <nartal@nvidia.com>
@nartal1
Copy link
Collaborator Author

nartal1 commented Mar 10, 2026

build

@nartal1 nartal1 requested a review from jihoonson March 10, 2026 17:41
jihoonson
jihoonson previously approved these changes Mar 10, 2026
Copy link
Collaborator

@jihoonson jihoonson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @nartal1

SERVER_ID='snapshots'
SERVER_URL="$URM_URL-local"
SCALA_VERSION=`mvn help:evaluate -q -pl dist -Dexpression=scala.binary.version -DforceStdout`
SCALA_VERSION=`mvn help:evaluate -q -f $POM_FILE -pl dist -Dexpression=scala.binary.version -DforceStdout`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Define a standard way of detecting versions and source this across scripts . Why is this different from build.sh? Can we get it from some build info file in the DBR image ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated it similar to build.sh. I couldn't figure out this from any files in the DBR image.

Comment on lines +95 to +97
# Spark 3.x versions
deps += [Artifact('org.apache.hive', 'hive-metastore-client-patched',
f'{spark_prefix}--patched-hive-with-glue--hive-*-patch-{spark_suffix}_deploy.jar')]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make the indentation consistent at least in the code you are adding

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.


/**
* Databricks 17.3 version where getRuntimeStatistics has compile-time access restrictions.
* The method exists and is public at runtime, but compile-time metadata shows it as protected,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like Scala protected compiles to JVM public. Can we not then make it a Java class and avoid reflection?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the pointer. Updated it.

def repartition(numPartitions: Int, updatedRepartitioningStatus: AdaptiveRepartitioningStatus):
ShuffleExchangeLike = {
val newCpuPartitioning = cpuOutputPartitioning.withNewNumPartitions(numPartitions)
copy(gpuOutputPartitioning, child, shuffleOrigin)(newCpuPartitioning)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to pass updatedRepartitioningStatus around here somewhere?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated it to pass updatedReparitiongStatus. Earlier it was always the default value.

Comment on lines +28 to +29
// Databricks 17.3: Use jackson.Serialization instead of JsonMethods
import org.json4s.jackson.Serialization
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we shim just this aspect instead of copy-and-pasting 650 lines ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added shim for this. I missed it earlier since it is in tests module. Initially I had updated to make it to compile but later forgot to shim it.

{"spark": "358"}
{"spark": "400"}
{"spark": "401"}
{"spark": "402"}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting lost in the tags. Which StreamingShims will cover 410,411?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a dead code at this point. This is handled in FileStreamSinkShims - 411/FileStreamSinkShims
Added 400db173 shim to this.

nartal1 added 5 commits March 11, 2026 18:24
Signed-off-by: Niranjan Artal <nartal@nvidia.com>
copy function

Signed-off-by: Niranjan Artal <nartal@nvidia.com>
Signed-off-by: Niranjan Artal <nartal@nvidia.com>
@nartal1
Copy link
Collaborator Author

nartal1 commented Mar 11, 2026

build

# and Databricks 15.4 are both based on spark version 3.5.0
BUILDVER="$BUILDVER$DBR_VER"
SPARK_VERSION_TO_INSTALL_DATABRICKS_JARS="$SPARK_VERSION_TO_INSTALL_DATABRICKS_JARS-$DBR_VER"
elif [ $DBR_VER == '17.3' ]; then
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: better to merge 14.3 and 17.3 together like if [ $DBR_VER == '14.3' ] || [ $DBR_VER == '17.3' ]; then ...

SCALA_VERSION=`mvn help:evaluate -q -pl dist -Dexpression=scala.binary.version -DforceStdout`
# Determine Scala version from Spark version: Spark 4.x uses Scala 2.13, earlier uses 2.12
if [[ "$BASE_SPARK_VERSION_TO_INSTALL_DATABRICKS_JARS" == 4.* ]]; then
SCALA_VERSION="2.13"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also define POM_FILE here for deploy.sh like what [jenkins/databricks/build.sh‎] does

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants