Skip to content

Conversation

@m-abulazm
Copy link
Contributor

@m-abulazm m-abulazm commented Jan 15, 2026

Changes

What does this PR do?

  • Clean checkpoint volume after reconcile runs
  • Remove usage of overwrite write mode
  • use delta volumes if running on databricks instead of parquet
  • Refactor interface to hide implementation details and be more generic
  • add TODO to mark where we need to persist to delta instead of hitting source system

Caveats/things to watch out for when reviewing:

On Serverless we cannot use cache/persist, so we use Delta writes acting as materialization boundaries

Linked issues

Resolves #1056
Advances #1905, #1438

Functionality

  • added relevant user documentation
  • added new CLI command
  • modified existing command: databricks labs lakebridge ...
  • ... +add your own

Tests

  • manually tested
  • added unit tests
  • added integration tests

@codecov
Copy link

codecov bot commented Jan 15, 2026

Codecov Report

❌ Patch coverage is 0% with 77 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.91%. Comparing base (a678bfe) to head (0ebf789).

Files with missing lines Patch % Lines
...abricks/labs/lakebridge/reconcile/recon_capture.py 0.00% 42 Missing ⚠️
...ridge/reconcile/trigger_recon_aggregate_service.py 0.00% 21 Missing ⚠️
...labs/lakebridge/reconcile/trigger_recon_service.py 0.00% 7 Missing ⚠️
...rc/databricks/labs/lakebridge/reconcile/compare.py 0.00% 4 Missing ⚠️
...bricks/labs/lakebridge/reconcile/reconciliation.py 0.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2230      +/-   ##
==========================================
- Coverage   63.96%   63.91%   -0.06%     
==========================================
  Files          99       99              
  Lines        8626     8634       +8     
  Branches      888      888              
==========================================
  Hits         5518     5518              
- Misses       2936     2944       +8     
  Partials      172      172              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link

github-actions bot commented Jan 15, 2026

✅ 132/132 passed, 9 flaky, 5 skipped, 14m21s total

Flaky tests:

  • 🤪 test_installs_and_runs_local_bladebridge (19.555s)
  • 🤪 test_installs_and_runs_pypi_bladebridge (21.419s)
  • 🤪 test_transpiles_informatica_to_sparksql_non_interactive[True] (17.139s)
  • 🤪 test_transpiles_informatica_to_sparksql_non_interactive[False] (18.009s)
  • 🤪 test_transpiles_informatica_to_sparksql (19.242s)
  • 🤪 test_transpile_teradata_sql_non_interactive[False] (5.513s)
  • 🤪 test_transpile_teradata_sql (6.575s)
  • 🤪 test_transpile_teradata_sql_non_interactive[True] (8.44s)
  • 🤪 test_recon_databricks_job_succeeds (1m27.223s)

Running from acceptance #3462

@m-abulazm m-abulazm force-pushed the refactor/improve-recon-persistence branch from f112109 to fb290c2 Compare January 15, 2026 15:23
@m-abulazm m-abulazm changed the title remove intermediate persistence and caching all together in reconcile Refactor recon intermediate persistence Jan 19, 2026
@m-abulazm m-abulazm marked this pull request as ready for review January 21, 2026 12:25
@m-abulazm m-abulazm requested a review from a team as a code owner January 21, 2026 12:25
@m-abulazm m-abulazm added feat/recon making sure that remorphed query produces the same results as original internal technical pr's not end user facing labels Jan 21, 2026
Comment on lines +61 to +65
@cached_property
def _is_databricks(self) -> bool:
is_db = any(k.startswith("spark.databricks") for k in self._spark.conf.getAll.keys())
logger.info(f"Running on Databricks check completed with result: {is_db}")
return is_db
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would create this method outside the class.

Suggested change
@cached_property
def _is_databricks(self) -> bool:
is_db = any(k.startswith("spark.databricks") for k in self._spark.conf.getAll.keys())
logger.info(f"Running on Databricks check completed with result: {is_db}")
return is_db
@lru_cache(maxsize=1)
def is_databricks(spark: SparkSession) -> bool:
is_db = any(k.startswith("spark.databricks") for k in spark.conf.getAll.keys())
logger.info(f"Running on Databricks check completed with result: {is_db}")
return is_db

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feat/recon making sure that remorphed query produces the same results as original internal technical pr's not end user facing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[TODO] for now we are overwriting the intermediate cache path. We should delete the volume in future

3 participants