Fix serverless compatibility by replacing cache() with conditional persistence #2218

BesikiML · 2026-01-07T03:16:12Z

🐛 Problem
Reconciliation fails on Databricks serverless compute with:
[NOT_SUPPORTED_WITH_SERVERLESS] PERSIST TABLE is not supported on serverless compute
🔍 Root Cause
The reconciliation process uses .cache() for performance optimization, but serverless compute does not support DataFrame caching operations.
✅ Solution
Implemented serverless detection and conditional caching strategy:
Changes Made

Added Serverless Detection Method
New _is_serverless() method checks for clusterNodeType config
Classic clusters: config exists → returns False
Serverless: config throws CONFIG_NOT_AVAILABLE → returns True
Conditional Caching Logic
Classic clusters: Uses .cache() for performance (existing behavior)
Serverless: Skips caching to avoid runtime errors
Technical Details
Detection Method:
node_type = self._spark.conf.get("spark.databricks.clusterUsageTags.clusterNodeType")
✅ Classic: Returns node type (e.g., i3.2xlarge)
❌ Serverless: Throws AnalysisException with CONFIG_NOT_AVAILABLE

Fixed issue: #1438

Tests

manually tested
added unit tests
added integration tests

Use Unity Catalog volumes instead of .cache() for serverless. Auto-detects compute type. Fixes: [NOT_SUPPORTED_WITH_SERVERLESS]

github-actions · 2026-01-07T03:20:57Z

✅ 130/130 passed, 8 flaky, 5 skipped, 9m40s total

Flaky tests:

🤪 test_installs_and_runs_local_bladebridge (20.228s)
🤪 test_installs_and_runs_pypi_bladebridge (25.318s)
🤪 test_transpiles_informatica_to_sparksql_non_interactive[True] (16.263s)
🤪 test_transpiles_informatica_to_sparksql (16.575s)
🤪 test_transpile_teradata_sql (18.15s)
🤪 test_transpiles_informatica_to_sparksql_non_interactive[False] (3.719s)
🤪 test_transpile_teradata_sql_non_interactive[False] (5.424s)
🤪 test_transpile_teradata_sql_non_interactive[True] (13.173s)

_{Running from acceptance #3493}

codecov · 2026-01-07T03:21:03Z

Codecov Report

❌ Patch coverage is 0% with 26 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.78%. Comparing base (d212ba7) to head (df8ce7b).

Files with missing lines	Patch %	Lines
...abricks/labs/lakebridge/reconcile/recon_capture.py	0.00%	21 Missing ⚠️
...bricks/labs/lakebridge/reconcile/reconciliation.py	0.00%	3 Missing ⚠️
...rc/databricks/labs/lakebridge/reconcile/compare.py	0.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2218      +/-   ##
==========================================
- Coverage   63.95%   63.78%   -0.17%     
==========================================
  Files          99       99              
  Lines        8644     8666      +22     
  Branches      890      893       +3     
==========================================
  Hits         5528     5528              
- Misses       2944     2966      +22     
  Partials      172      172

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…on-serverless-compute

Use specific exception types instead of broad Exception catch to satisfy CI linter rules. Add cluster ID check for improved detection.

Avoids CONFIG_NOT_AVAILABLE exceptions by fetching all configs at once. Passes all linter checks.

src/databricks/labs/lakebridge/reconcile/reconciliation.py

distinguishes serverless (CONFIG_NOT_AVAILABLE) from classic clusters.

…on-serverless-compute

m-abulazm

Good job. we should always add type annotations to any new code especially to a public method. once this comment is addressed we can merge this down

m-abulazm · 2026-01-19T10:47:35Z

src/databricks/labs/lakebridge/reconcile/recon_capture.py

            raise ReadAndWriteWithVolumeException(message) from e


+def classify_spark_runtime(spark):


please add type annotation

Suggested change

def classify_spark_runtime(spark):

SparkRuntimeType = Literal["DATABRICKS_SERVERLESS", "CLASSIC" , .....]

def classify_spark_runtime(spark: SparkSession) -> SparkRuntimeType:

m-abulazm · 2026-01-19T10:50:52Z

tests/integration/reconcile/test_recon_capture.py

+
+
+def test_classify_spark_runtime(spark):
+    assert classify_spark_runtime(spark) != "DATABRICKS_SERVERLESS"


please add a comment that our sandbox environment uses spark connect so in case this changes in the future, the comment explains why it was done in this way originally

…ss-compute' of github.com:databrickslabs/lakebridge into 1438-feature-remorph-reconcile-fails-to-run-on-serverless-compute

…on-serverless-compute

m-abulazm · 2026-01-20T13:36:38Z

src/databricks/labs/lakebridge/reconcile/compare.py

    # Write the joined df to volume path
-    joined_volume_df = ReconIntermediatePersist(spark, path).write_and_read_unmatched_df_with_volumes(joined_df).cache()
+    joined_volume_df = ReconIntermediatePersist(spark, path).write_and_read_unmatched_df_with_volumes(joined_df)
+    cluster_type = classify_spark_runtime(spark)


this method is used inside a loop and we should not calculate this every time. I would either cache this value or calculate at the caller before any loops and pass it here

We can create ReconIntermediatePersist(spark, path) before the loop and pass it in, but since joined_df changes with the loop parameters, how can we cache it ahead of time?

I do think a ReconIntermediatePersist#cache_if_supported() method is a good idea

another question though; why are we caching here after writing to storage? what is the improvement? otherwise we can just remove it from here

This (cache) approach was already there; I didn’t add it.

m-abulazm · 2026-01-20T13:38:34Z

src/databricks/labs/lakebridge/reconcile/recon_capture.py

+SparkRuntimeType = Literal["DATABRICKS_SERVERLESS", "CLASSIC", "SPARK_CONNECT", "NO_JVM_UNKNOWN"]
+
+
+def classify_spark_runtime(spark) -> SparkRuntimeType:


Suggested change

def classify_spark_runtime(spark) -> SparkRuntimeType:

def classify_spark_runtime(spark: SparkSession) -> SparkRuntimeType:

…ss-compute' of github.com:databrickslabs/lakebridge into 1438-feature-remorph-reconcile-fails-to-run-on-serverless-compute

…on-serverless-compute

… added cache_df_or_not method

…on-serverless-compute

feat: fix serverless compute compatibility

fe944f2

Use Unity Catalog volumes instead of .cache() for serverless. Auto-detects compute type. Fixes: [NOT_SUPPORTED_WITH_SERVERLESS]

BesikiML requested a review from m-abulazm January 7, 2026 03:16

BesikiML requested a review from a team as a code owner January 7, 2026 03:16

BesikiML linked an issue Jan 7, 2026 that may be closed by this pull request

[Feature]: Remorph Reconcile fails to run on serverless compute #1438

Open

1 task

BesikiML temporarily deployed to tool January 7, 2026 03:16 — with GitHub Actions Inactive

BesikiML self-assigned this Jan 7, 2026

Fixed detection of running on serverless compute

06d7f61

BesikiML temporarily deployed to tool January 7, 2026 16:24 — with GitHub Actions Inactive

Merge branch 'main' into 1438-feature-remorph-reconcile-fails-to-run-…

60c46d2

…on-serverless-compute

BesikiML temporarily deployed to tool January 7, 2026 16:46 — with GitHub Actions Inactive

fix: remove pylint disable and use specific exceptions

2ab2352

Use specific exception types instead of broad Exception catch to satisfy CI linter rules. Add cluster ID check for improved detection.

BesikiML temporarily deployed to tool January 7, 2026 16:54 — with GitHub Actions Inactive

refactor: use getAll() instead of conf.get() for serverless detection

a049243

Avoids CONFIG_NOT_AVAILABLE exceptions by fetching all configs at once. Passes all linter checks.

BesikiML temporarily deployed to tool January 7, 2026 17:54 — with GitHub Actions Inactive

fixed dict issue

78f2253

BesikiML temporarily deployed to tool January 7, 2026 18:21 — with GitHub Actions Inactive

fixed getAll call

0d416c1

BesikiML temporarily deployed to tool January 7, 2026 18:53 — with GitHub Actions Inactive

Added AnalysisException in the exept block

670940b

BesikiML temporarily deployed to tool January 7, 2026 19:21 — with GitHub Actions Inactive

Optimised _is_serverless function

8fc57b5

BesikiML temporarily deployed to tool January 7, 2026 20:31 — with GitHub Actions Inactive

m-abulazm reviewed Jan 8, 2026

View reviewed changes

src/databricks/labs/lakebridge/reconcile/reconciliation.py Outdated Show resolved Hide resolved

m-abulazm requested changes Jan 9, 2026

View reviewed changes

src/databricks/labs/lakebridge/reconcile/reconciliation.py Outdated Show resolved Hide resolved

src/databricks/labs/lakebridge/reconcile/reconciliation.py Outdated Show resolved Hide resolved

Replace clusterType check with clusterNodeType which reliably

d55fc20

distinguishes serverless (CONFIG_NOT_AVAILABLE) from classic clusters.

BesikiML temporarily deployed to tool January 9, 2026 21:31 — with GitHub Actions Inactive

Merge branch 'main' into 1438-feature-remorph-reconcile-fails-to-run-…

bb37933

…on-serverless-compute

BesikiML temporarily deployed to tool January 9, 2026 21:51 — with GitHub Actions Inactive

BesikiML had a problem deploying to tool January 16, 2026 14:00 — with GitHub Actions Error

Formated the code

8b0fac0

BesikiML temporarily deployed to tool January 16, 2026 14:02 — with GitHub Actions Inactive

Fixed df cache forserverless

ba2f944

BesikiML temporarily deployed to tool January 16, 2026 16:28 — with GitHub Actions Inactive

BesikiML requested a review from m-abulazm January 16, 2026 16:29

Merge branch 'main' into 1438-feature-remorph-reconcile-fails-to-run-…

c831b3e

…on-serverless-compute

BesikiML temporarily deployed to tool January 16, 2026 16:54 — with GitHub Actions Inactive

m-abulazm requested changes Jan 19, 2026

View reviewed changes

BesikiML added 2 commits January 19, 2026 23:13

Added SparkRuntimeType

977b9a8

Merge branch '1438-feature-remorph-reconcile-fails-to-run-on-serverle…

b5531a4

…ss-compute' of github.com:databrickslabs/lakebridge into 1438-feature-remorph-reconcile-fails-to-run-on-serverless-compute

BesikiML temporarily deployed to tool January 20, 2026 04:14 — with GitHub Actions Inactive

Merge branch 'main' into 1438-feature-remorph-reconcile-fails-to-run-…

e6d4d6a

…on-serverless-compute

BesikiML temporarily deployed to tool January 20, 2026 04:34 — with GitHub Actions Inactive

BesikiML requested a review from m-abulazm January 20, 2026 04:34

m-abulazm requested changes Jan 20, 2026

View reviewed changes

BesikiML added 2 commits January 20, 2026 23:37

Added SparkSession for spark param

7e3488a

Merge branch '1438-feature-remorph-reconcile-fails-to-run-on-serverle…

5960bdc

…ss-compute' of github.com:databrickslabs/lakebridge into 1438-feature-remorph-reconcile-fails-to-run-on-serverless-compute

BesikiML temporarily deployed to tool January 21, 2026 04:37 — with GitHub Actions Inactive

Merge branch 'main' into 1438-feature-remorph-reconcile-fails-to-run-…

48af7d8

…on-serverless-compute

BesikiML temporarily deployed to tool January 21, 2026 14:12 — with GitHub Actions Inactive

Changed classify_spark_runtime to private _classify_spark_runtime and…

de49b61

… added cache_df_or_not method

BesikiML temporarily deployed to tool January 22, 2026 03:29 — with GitHub Actions Inactive

BesikiML requested a review from m-abulazm January 22, 2026 03:39

Added # pylint: disable=import-private-name

c2f025e

BesikiML temporarily deployed to tool January 22, 2026 04:04 — with GitHub Actions Inactive

removed the test case

d322579

BesikiML temporarily deployed to tool January 22, 2026 05:00 — with GitHub Actions Inactive

Merge branch 'main' into 1438-feature-remorph-reconcile-fails-to-run-…

df8ce7b

…on-serverless-compute

BesikiML deployed to tool January 22, 2026 05:21 — with GitHub Actions Active

		raise ReadAndWriteWithVolumeException(message) from e


		def classify_spark_runtime(spark):

-def classify_spark_runtime(spark):
+SparkRuntimeType = Literal["DATABRICKS_SERVERLESS",  "CLASSIC" , .....]
+def classify_spark_runtime(spark: SparkSession) -> SparkRuntimeType:



		def test_classify_spark_runtime(spark):
		assert classify_spark_runtime(spark) != "DATABRICKS_SERVERLESS"

		SparkRuntimeType = Literal["DATABRICKS_SERVERLESS", "CLASSIC", "SPARK_CONNECT", "NO_JVM_UNKNOWN"]


		def classify_spark_runtime(spark) -> SparkRuntimeType:

	def classify_spark_runtime(spark) -> SparkRuntimeType:
	def classify_spark_runtime(spark: SparkSession) -> SparkRuntimeType:

Fix serverless compatibility by replacing cache() with conditional persistence #2218

Are you sure you want to change the base?

Fix serverless compatibility by replacing cache() with conditional persistence #2218

Conversation

BesikiML commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Tests

Uh oh!

github-actions bot commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

m-abulazm left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

m-abulazm Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

BesikiML commented Jan 7, 2026 •

edited

Loading

github-actions bot commented Jan 7, 2026 •

edited

Loading

codecov bot commented Jan 7, 2026 •

edited

Loading

m-abulazm Jan 21, 2026 •

edited

Loading