04 Jul 19:34

gueniai

0243c34

v0.10.3

Converter Improvements

General:

Updated CLI argument handling for transpile (See 1637): The transpile command now has improved argument validation, clearer error handling, and more flexible configuration options.
Workaround issue loading transpiler configuration with python ≥ 3.12.4 (See 1802):
Fixed an issue with transpiler configuration loading on Python 3.12.4+ by updating type hints and removing problematic imports.

Bladebridge Converter

Teradata

Enhanced handling of the TRUNC function and improved date part translation logic for more accurate Teradata conversions.

Synapse

Fixed datatype conversion issues and removed unnecessary parentheses in DDL statements for Synapse.
Improved header cleaning, removed unsupported N String literals, and fixed several DDL issues including datatype and null literal handling.
Merged Synapse and MS SQL config files, fixed code loss and datatype wrapping issues, improved handling of ALTER TABLE and view definitions, and added new datatype mappings and regex patterns.

MS SQL

Fixed datatype conversion issues and removed unnecessary parentheses in DDL statements for MS SQL.
Fixed issues with object_id handling and resolved transpiler errors with IF conditions in SQL code.
Unified configuration with Synapse, addressed code loss, improved datatype and view handling, and cleaned up redundant SQL commands.

Datastage

Added support for Datastage functions such as DateFromComponents, ALNUM, and SURROGATEKEYGEN, and enhanced function substitution.
Fixed expression and filter handling, improved function substitution, and enhanced literal wrapping and SQL expression handling for Datastage to Pyspark conversions.

Datastage and Informatica PySpark target:

Fixed issues with AGGREGATE node handling and improved column/expression wrapping in aggregate nodes.
Enhanced import handling, removed unnecessary aliases, and improved pre- and post-SQL expression processing for PySpark.

General / Multi-Dialect
Fixed issues with generating single output files in nested folders, improving output file handling for XML, Python, and JSON formats.

Reconcile improvements

Enabled TSQL Recon (See 1798): Added support for TSQL-based reconciliation, allowing TSQL scripts as input and updating the SQL Server adapter and tests for TSQL compatibility.

Documentation updates

Banner for Informatica Cloud (See 1797): Informatica Cloud is temporarily unsupported; a warning banner and updated docs now advise users to contact Databricks for alternatives while a fix is in progress.
Documentation for Reconcile Automation (See 1793): New documentation and utilities streamline table reconciliation, including example notebooks, validation rules, and a static web interface for Snowflake transformations.
Update python requirements (See 1766): The library now supports Python 3.10 and above, with updated installation instructions and emphasis on Java 11+ requirements.

General

Improve diagnostics if the Java check fails prior to installing morpheus (See 1784): Enhanced Java version checks provide clearer error messages and better logging if Java is missing or incompatible during installation.
Updated blueprint dependency, to ensure login URLs are accepted as host (See 1760): Blueprint dependency updated to allow login URLs as workspace hosts, resolving previous issues with host profile settings.

Contributors: @asnare, @sundarshankar89, @gueniai, @IK-Databricks

Contributors

asnare, gueniai, and 2 other contributors

Assets 4

25 Jun 21:58

gueniai

v0.10.2

020f841

v0.10.2

Analyzer Improvements

Enabled BODS as a source

Converter Improvements

Better Handling of Unicode in SQL Files: No more weird characters! Lakebridge now automatically detects and removes Unicode BOMs from SQL files, ensuring your files load cleanly—no matter the encoding. (See #1733)
Cleaner Output Files: Header comments that sometimes caused formatting issues in Python or JSON files are now gone. Your output files will only contain the code you need—no extra comments at the top. (See #1751)
Bug fix: Fixed PyArmor issue affecting Windows installations.
Enhancement: Python version 3.10 and above is now supported
Morpheus converter:
- Databricks Tuple Support: You can now use multi-column (tuple) comparisons like WHERE (A, B, C) NOT IN (SELECT X, Y, Z...)—improving compatibility with Snowflake and Databricks SQL and making complex queries work as expected.
- TRUNCATE Function Transformation: Added support for converting the TRUNCATE function and new related keywords, expanding the range of SQL statements you can process.
- ALL and ANY Subquery Expressions: The converter now understands and supports ALL and ANY subquery expressions, so you can handle more complex SQL logic with ease.
- Improved Snowflake LET Command: The LET command for Snowflake now works even if you don’t provide an assignment or default value.
BladeBridge converter:
- Datastage: Improved support for duplicate link names
- Datastage: Fixed filter for EXPRESSION in PySpark target
- Informatica: Fixed output, now writing to flat file for SparkSql. Placing source post sql after the target writes
- Improved data type conversions from Oracle

Documentation Refresh

Clearer instructions for installation, setup, and requirements. Updated examples and requirements (See #1738)
Updated converters supported dialects matrix for clarity on supported input and outputs (See #1764)
Improved Docs Sidebar Navigation: The documentation sidebar is now smarter and more interactive, making it easier to find what you need quickly. (See #1754)

Contributors: @gueniai, @IK-Databricks, @sundarshankar89, @asnare, @vil1

Contributors

vil1, asnare, and 3 other contributors

Assets 4

19 Jun 16:22

sundarshankar89

v0.10.1

cd0481a

Release v0.10.1

Lakebridge v0.10.1 release

Analyzer Improvements

Debug Mode for Analyzer (#1727): Run the Analyzer in debug mode by setting your logging level to DEBUG for more detailed diagnostics.
Supported Sources Table (#1709, #1708): The docs now clearly list all supported source platforms and dialects, so you can quickly check compatibility.

Converter Improvements

Encoding Support (#1719): Lakebridge now handles quoted-printable encoding in ETL sources.
Java Version Handling (#1730, #1731): The system now detects if Java isn’t installed and gives clear error messages. Java version parsing is also improved.
Cleaner Output (#1684): Transpiled code output no longer includes unnecessary line number comments.
BladeBridge converter inserts FIXME comments in lines of code we couldn't automatically convert
BladeBridge converter enabled Informatica Cloud migrations

Installation and Configuration Updates

Smarter Install Process (#1691): The installer now avoids errors if you choose not to overwrite existing configurations.
Configure Reconcile Patch (#1690): Deployment of reconciliation jobs, tables, and dashboards now works as expected, targeting the correct files.

Logging and Error Reporting

Cleaner Logging (#1704): Log messages are less noisy and more consistent, with important info easier to spot.
Compact Error Reporting (#1693): Errors are grouped and summarized, making it easier to review issues.
Severity-based Logging (#1685): Diagnostic messages are logged with the right severity (error, warning, info).

General Documentation and Template Updates

Clearer, Friendlier Docs (#1701, #1688): Installation and usage guides are now easier to follow, with new flowcharts, step-by-step instructions, and improved formatting.
New Issue Templates (#1721, #1687, #1682): Submitting documentation or bug issues is easier with new, interactive templates.
Supported Sources and Dialects (#1709, #1708): Docs now include clear tables outlining supported platforms and SQL dialects, including experimental dbt repointing.

Assets 4

09 Jun 13:49

sundarshankar89

v0.10.0

c3efe3b

Release v0.10.0

🚀 Lakebridge v0.10.0 – The Bridge to Databricks Awaits! 🌉

Welcome to the inaugural release of Lakebridge, your all-in-one, open-source toolkit for seamless SQL migration to Databricks! Whether you're staring down a mountain of legacy SQL or just want to make sure your data lands safely on the other side, Lakebridge is here to make your journey smooth, insightful, and even a little bit fun.

✨ What's Inside Lakebridge v0.10.0?

🕵️ Pre-migration Assessment: Know Before You Go

Profiler: Connects to your existing SQL environment and delivers a detailed report on workload size, complexity, and features used.
Analyzer: Scans your SQL code and orchestration, highlighting potential migration challenges and estimating the effort required.

🔄 SQL Conversion: Dialect Dilemmas, Solved

Transpilers Galore: Choose between the battle-tested BladeBridge or the next-gen Morpheus (with experimental dbt support!) to convert your SQL and ETL code from a variety of platforms, including:
- DataStage
- Informatica (Cloud, PC)
- Netezza
- Oracle (incl. ADS & Exadata)
- Snowflake
- SQL Server (incl. Synapse)
- Teradata
SQL & ETL/Orchestration Translation: Move more than just queries—bring your workflows, too!
Error Highlighting & Compatibility Warnings: Because nobody likes a silent failure.

🧮 Post-migration Reconciliation: Trust, but Verify

Automated Data Reconciliation: Compare source and Databricks tables to ensure your data made the leap intact.
Supports Multiple Sources: Snowflake, Oracle, and Databricks—more to come!
Discrepancy Detection: Find mismatches before your users do.

🛠️ Installation: As Easy as Copy-Paste

databricks labs install lakebridge

Python 3.10+, Java 11, and the Databricks CLI are your only prerequisites. Windows, Mac, or Linux—Lakebridge welcomes all!

🧑‍💻 Why Lakebridge?

Comprehensive: Handles every phase of migration, from assessment to reconciliation.
Flexible: Supports multiple SQL dialects and ETL platforms.
Open Source: Built by Databricks Labs, improved by the community.
Witty Documentation: Because migration shouldn't be boring.

💬 Get Involved!

Spotted a bug? Have a feature idea? Want to contribute? Open an issue or pull request—let's build the future of SQL migration together!

Thank you for joining us at the start of this journey.

Happy Migration! 🚀

— The Lakebridge Team

Assets 4

08 Jun 00:22

gueniai

v0.9.1

ca996d7

v0.9.1

Remorph deprecation release.
This is the last release of Remorph, in favour of the release of Lakebridge.

Assets 4

02 Dec 16:34

sundarshankar89

v0.9.0

ab58f42

v0.9.0

Added support for format_datetime function in presto to Databricks (#1250). A new format_datetime function has been added to the Parser class in the presto.py file to provide support for formatting datetime values in Presto on Databricks. This function utilizes the DateFormat.from_arg_list method from the local_expression module to format datetime values according to a specified format string. To ensure compatibility and consistency between Presto and Databricks, a new test file test_format_datetime_1.sql has been added, containing SQL queries that demonstrate the usage of the format_datetime function in Presto and its equivalent in Databricks, DATE_FORMAT. This standalone change adds new functionality without modifying any existing code.
Added support for SnowFlake SUBSTR (#1238). This commit enhances the library's SnowFlake support by adding the SUBSTR function, which was previously unsupported and existed only as an alternative to SUBSTRING. The project now fully supports both functions, and the SUBSTRING function can be used interchangeably with SUBSTR via the new withConversionStrategy(SynonymOf("SUBSTR")) method. Additionally, this commit supersedes a previous pull request that lacked a GPG signature and includes a test for the SUBSTR function. The ARRAY_SLICE function has also been updated to match SnowFlake's behavior, and the project now supports a more comprehensive list of SQL functions with their corresponding arity.
Added support for json_size function in presto (#1236). A new json_size function for Presto has been added, which determines the size of a JSON object or array and returns an integer. Two new methods, _build_json_size and get_json_object, have been implemented to handle JSON objects and arrays differently, and the Parser and Tokenizer classes of the Presto class have been updated to include the new json_size function. An alternative implementation for Databricks using SQL functions is provided, and a test case is added to cover a fixed is not null error for json_extract in the Databricks generator. Additionally, a new test file for Presto has been added to test the functionality of the json_extract function in Presto, and a new method GetJsonObject is introduced to extract a JSON object from a given path. The json_extract function has also been updated to extract the value associated with a specified key from JSON data in both Presto and Databricks.
Enclosed subqueries in parenthesis (#1232). This PR introduces changes to the ExpressionGenerator and LogicalPlanGenerator classes to ensure that subqueries are correctly enclosed in parentheses during code generation. Previously, subqueries were not always enclosed in parentheses, leading to incorrect code. This issue has been addressed by enclosing subqueries in parentheses in the in and scalarSubquery methods, and by adding new match cases for ir.Filter in the LogicalPlanGenerator class. The changes also take care to avoid doubling enclosing parentheses in the .. IN(SELECT...) pattern. New methods have not been added, and existing functionality has been modified to ensure that subqueries are correctly enclosed in parentheses, leading to the generation of correct SQL code. Test cases have been included in a separate PR. These changes improve the correctness of the generated code, avoiding issues such as SELECT * FROM SELECT * FROM t WHERE a > a WHERE a > 'b' and ensuring that the generated code includes parentheses around subqueries.
Fixed serialization of MultipleErrors (#1177). In the latest release, the encoding of errors in the com.databricks.labs.remorph.coverage package has been improved with an update to the encoders.scala file. The change involves a fix for serializing MultipleErrors instances using the asJson method on each error instead of just the message. This modification ensures that all relevant information about each error is included in the encoded output, improving the accuracy of serialization for MultipleErrors class. Users who handle multiple errors and require precise serialization representation will benefit from this enhancement, as it guarantees comprehensive information encoding for each error instance.
Fixed presto strpos and array_average functions (#1196). This PR introduces new classes Locate and NamedStruct in the local_expression.py file to handle the STRPOS and ARRAY_AVERAGE functions in a Databricks environment, ensuring compatibility with Presto SQL. The STRPOS function, used to locate the position of a substring within a string, now uses the Locate class and emits a warning regarding differences in implementation between Presto and Databricks SQL. A new method _build_array_average has been added to handle the ARRAY_AVERAGE function in Databricks, which calculates the average of an array, accommodating nulls, integers, and doubles. Two SQL test cases have been added to demonstrate the use of the ARRAY_AVERAGE function with arrays containing integers and doubles. These changes promote compatibility and consistent behavior between Presto and Databricks when dealing with STRPOS and ARRAY_AVERAGE functions, enhancing the ability to migrate between the systems smoothly.
Handled presto Unnest cross join to Databricks lateral view (#1209). This release introduces new features and updates for handling Presto UNNEST cross joins in Databricks, utilizing the lateral view feature. New methods have been added to improve efficiency and robustness when handling UNNEST cross joins. Additionally, new test cases have been implemented for Presto and Databricks to ensure compatibility and consistency between the two systems in handling UNNEST cross joins, array construction and flattening, and parsing JSON data. Some limitations and issues remain, which will be addressed in future work. The acceptance tests have also been updated, with certain tests now expected to pass, while others may still fail. This release aims to improve the functionality and compatibility of Presto and Databricks when handling UNNEST cross joins and JSON data.
Implemented remaining TSQL set operations (#1227). This pull request enhances the TSql parser by adding support for parsing and converting the set operations UNION [ALL], EXCEPT, and INTERSECT to the Intermediate Representation (IR). Initially, the grammar recognized these operations, but they were not being converted to the IR. This change resolves issues #1126 and #1102 and includes new unit, transpiler, and functional tests, ensuring the correct behavior of these set operations, including precedence rules. The commit also introduces a new test file, union-all.sql, demonstrating the correct handling of simple UNION ALL operations, ensuring consistent output across TSQL and Databricks SQL platforms.
Supported multiple columns in order by clause in for ARRAYAGG (#1228). This commit enhances the ARRAYAGG and LISTAGG functions by adding support for multiple columns in the order by clause and sorting in both ascending and descending order. A new method, sortArray, has been introduced to handle multiple sort orders. The changes also improve the functionality of the ARRAYAGG function in the Snowflake dialect by supporting multiple columns in the ORDER BY clause, with an optional DESC keyword for each column. The WithinGroupParams dataclass has been updated in the local expression module to include a list of tuples for the order columns and their sorting direction. These changes provide increased flexibility and control over the output of the ARRAYAGG and LISTAGG functions
Added TSQL parser support for (LHS) UNION RHS queries (#1211). In this release, we have implemented support for a new form of UNION in the TSQL parser, specifically for queries formatted as (SELECT a from b) UNION [ALL] SELECT x from y. This allows the union of two SELECT queries with an optional ALL keyword to include duplicate rows. The implementation includes a new case statement in the TSqlRelationBuilder class that handles this form of UNION, creating a SetOperation object with the left-hand side and right-hand side of the union, and an is_all flag based on the presence of the ALL keyword. Additionally, we have added support for parsing right-associative UNION clauses in TSQL queries, enhancing the flexibility and expressiveness of the TSQL parser for more complex and nuanced queries. The commit also includes new test cases to verify the correct translation of TSQL set operations to Databricks SQL, resolving issue #1127. This enhancement allows for more accurate parsing of TSQL queries that use the UNION operator in various formats.
Added support for inline columns in CTEs (#1184). In this release, we have added support for inline columns in Common Table Expressions (CTEs) in Snowflake across various components of our open-source library. This includes updates to the AST (Abstract Syntax Tree) for better TSQL translation and the introduction of the new case class KnownInterval for handling intervals. We have also implemented a new method, DealiasInlineColumnExpressions, in the SnowflakePlanParser...

Contributors

nfx, vil1, and 8 other contributors

Assets 4

07 Nov 16:04

nfx

v0.8.0

a32ea18

v0.8.0

Added IR for stored procedures (#1161). In this release, we have made significant enhancements to the project by adding support for stored procedures. We have introduced a new CreateVariable case class to manage variable creation within the intermediate representation (IR), and removed the SetVariable case class as it is now redundant. A new CaseStatement class has been added to represent SQL case statements with value match, and a CompoundStatement class has been implemented to enable encapsulation of a sequence of logical plans within a single compound statement. The DeclareCondition, DeclareContinueHandler, and DeclareExitHandler case classes have been introduced to handle conditional logic and exit handlers in stored procedures. New classes DeclareVariable, ElseIf, ForStatement, If, Iterate, Leave, Loop, RepeatUntil, Return, SetVariable, and Signal have been added to the project to provide more comprehensive support for procedural language features and control flow management in stored procedures. We have also included SnowflakeCommandBuilder support for stored procedures and updated the visitExecuteTask method to handle stored procedure calls using the SetVariable method.
Added Variant Support (#998). In this commit, support for the Variant datatype has been added to the create table functionality, enhancing the system's compatibility with Snowflake's datatypes. A new VariantType has been introduced, which allows for more comprehensive handling of data during create table operations. Additionally, a remarks VARIANT line is added in the CREATE TABLE statement and the corresponding spec test has been updated. The Variant datatype is a flexible datatype that can store different types of data, such as arrays, objects, and strings, offering increased functionality for users working with variant data. Furthermore, this change will enable the use of the Variant datatype in Snowflake tables and improves the data modeling capabilities of the system.
Added PySpark generator (#1026). The engineering team has developed a new PySpark generator for the com.databricks.labs.remorph.generators package. This addition introduces a new parameter, logical, of type Generator[ir.LogicalPlan, String], in the SQLGenerator for SQL queries. A new abstract class BasePythonGenerator has been added, which extends the Generator class and generates Python code. A ExpressionGenerator class has also been added, which extends BasePythonGenerator and is responsible for generating Python code for ir.Expression objects. A new LogicalPlanGenerator class has been added, which extends BasePythonGenerator and is responsible for generating Python code for a given ir.LogicalPlan. A new StatementGenerator class has been implemented, which converts Statement objects into Python code. A new Python-generating class, PythonGenerator, has been added, which includes the implementation of an abstract syntax tree (AST) for Python in Scala. This AST includes classes for various Python language constructs. Additionally, new implicit classes for PythonInterpolator, PythonOps, and PythonSeqOps have been added to allow for the creation of PySpark code using the Remorph framework. The AndOrToBitwise rule has been implemented to convert And and Or expressions to their bitwise equivalents. The DotToFCol rule has been implemented to transform code that references columns using dot notation in a DataFrame to use the col function with a string literal of the column name instead. A new PySparkStatements object and PySparkExpressions class have been added, which provide functionality for transforming expressions in a data processing pipeline to PySpark equivalents. The SnowflakeToPySparkTranspiler class has been added to transpile Snowflake queries to PySpark code. A new PySpark generator has been added to the Transpiler class, which is implemented as an instance of the SqlGenerator class. This change enhances the Transpiler class with a new PySpark generator and improves serialization efficiency.
Added debug-bundle command for folder-to-folder translation (#1045). In this release, we have introduced a debug-bundle command to the remorph project's CLI, specifically added to the proxy_command function, which already includes debug-script, debug-me, and debug-coverage commands. This new command enhances the tool's debugging capabilities, allowing developers to generate a bundle of translated queries for folder-to-folder translation tasks. The debug-bundle command accepts three flags: dialect, src, and dst, specifying the SQL dialect, source directory, and destination directory, respectively. Furthermore, the update includes refactoring the FileSetGenerator class in the orchestration package of the com.databricks.labs.remorph.generators package, adding a debug-bundle command to the Main object, and updating the FileQueryHistoryProvider method in the ApplicationContext trait. These improvements focus on providing a convenient way to convert folder-based SQL scripts to other formats like SQL and PySpark, enhancing the translation capabilities of the project.
Added ruff Python formatter proxy (#1038). In this release, we have added support for the ruff Python formatter in our project's continuous integration and development workflow. We have also introduced a new FORMAT stage in the WorkflowStage object in the Result Scala object to include formatting as a separate step in the workflow. A new RuffFormatter class has been added to format Python code using the ruff tool, and a StandardInputPythonSubprocess class has been included to run a Python subprocess and capture its output and errors. Additionally, we have added a proxy for the ruff formatter to the SnowflakeToPySparkTranspilerTest for Scala to improve the readability of the transpiled Python code generated by the SnowflakeToPySparkTranspiler. Lastly, we have introduced a new ruff formatter proxy in the test code for the transpiler library to enforce format and style conventions in Python code. These changes aim to improve the development and testing experience for the project and ensure that the code follows the desired formatting and style standards.
Added baseline for translating workflows (#1042). In this release, several new features have been added to the open-source library to improve the translation of workflows. A new dependency for the Jackson YAML data format library, version 2.14.0, has been added to the pom.xml file to enable processing YAML files and converting them to Java objects. A new FileSet class has been introduced, which provides an in-memory data structure to manage a set of files, allowing users to add, retrieve, and remove files by name and persist the contents of the files to the file system. A new FileSetGenerator class has been added that generates a FileSet object from a JobNode object, enabling the translation of workflows by generating all necessary files for a workspace. A new DefineJob class has been developed to define a new rule for processing JobNode objects in the Remorph system, converting instances of SuccessPy and SuccessSQL into PythonNotebookTask and SqlNotebookTask objects, respectively. Additionally, various new classes, such as GenerateBundleFile, QueryHistoryToQueryNodes, ReformatCode, TryGeneratePythonNotebook, TryGenerateSQL, TrySummarizeFailures, InformationFile, SuccessPy, SuccessSQL, FailedQuery, Migration, PartialQuery, QueryPlan, RawMigration, Comment, and PlanComment, have been introduced to provide a more comprehensive and nuanced job orchestration framework. The Library case class has been updated to better separate concerns between library configuration and code assets. These changes address issue #1042 and provide a more robust and flexible workflow translation solution.
Added correct generation of databricks.yml for QueryHistory (#1044). The FileSet class in the FileSet.scala file has been updated to include a new method that correctly generates the databricks.yml file for the QueryHistory feature. This file is used for orchestrating cross-compiled queries, creating three files in total - two SQL notebooks with translated and formatted queries and a databricks.yml file to define an asset bundle for the queries. The new method in the FileSet class writes the content to the file using the Files.write method from the java.nio.file package instead of the previously used PrintWriter. The FileSetGenerator class has been updated to include the new databricks.yml file generation, and new rules and methods have been added to improve the accuracy and consistency of schema definitions in the generated orchestration files. Additionally, the DefineJob and DefineSchemas classes have been introduced to simplify the orchestration generation process.
Added documentation around Transformation (#1043). In this release, the Transformation class in our open-source library has been enhanced with detailed documentation, type parameters, and new methods. The class represents a stateful computation that produces an output of type Out while managing a state of type State. The new methods include map and flatMap for modifying the output and chaining transformations, as well as run and runAndDiscardState for executing the computation with ...

Contributors

nfx, vil1, and 5 other contributors

Assets 4

04 Apr 12:26

sundarshankar89

v0.1.6

22a131e

v0.1.6

Added serverless validation using lsql library (#176). Workspaceclient object is used with product name and product_version along with corresponding cluster_id or warehouse_id as sdk_config in MorphConfig object.
Enhanced install script to enforce usage of a warehouse or cluster when skip-validation is set to False (#213). In this release, the installation process has been enhanced to mandate the use of a warehouse or cluster when the skip-validation parameter is set to False. This change has been implemented across various components, including the install script, transpile function, and get_sql_backend function. Additionally, new pytest fixtures and methods have been added to improve test configuration and resource management during testing. Unit tests have been updated to enforce usage of a warehouse or cluster when the skip-validation flag is set to False, ensuring proper resource allocation and validation process improvement. This development focuses on promoting a proper setup and usage of the system, guiding new users towards a correct configuration and improving the overall reliability of the tool.
Patch subquery with json column access (#190). The open-source library has been updated with new functionality to modify how subqueries with JSON column access are handled in the snowflake.py file. This change includes the addition of a check for an opening parenthesis after the FROM keyword to detect and break loops when a subquery is found, as opposed to a table name. This improvement enhances the handling of complex subqueries and JSON column access, making the code more robust and adaptable to different query structures. Additionally, a new test method, test_nested_query_with_json, has been introduced to the tests/unit/snow/test_databricks.py file to test the behavior of nested queries involving JSON column access when using a Snowflake dialect. This new method validates the expected output of a specific nested query when it is transpiled to Snowflake's SQL dialect, allowing for more comprehensive testing of JSON column access and type casting in Snowflake dialects. The existing test_delete_from_keyword method remains unchanged.
Snowflake UPDATE FROM to Databricks MERGE INTO implementation (#198).
Use Runtime SQL backend in Notebooks (#211). In this update, the db_sql.py file in the databricks/labs/remorph/helpers directory has been modified to support the use of the Runtime SQL backend in Notebooks. This change includes the addition of a new RuntimeBackend class in the backends module and an import statement for os. The get_sql_backend function now returns a RuntimeBackend instance when the DATABRICKS_RUNTIME_VERSION environment variable is present, allowing for more efficient and secure SQL statement execution in Databricks notebooks. Additionally, a new test case for the get_sql_backend function has been added to ensure the correct behavior of the function in various runtime environments. These enhancements improve SQL execution performance and security in Databricks notebooks and increase the project's versatility for different use cases.
Added Issue Templates for bugs, feature and config (#194). Two new issue templates have been added to the project's GitHub repository to improve issue creation and management. The first template, located in .github/ISSUE_TEMPLATE/bug.yml, is for reporting bugs and prompts users to provide detailed information about the issue, including the current and expected behavior, steps to reproduce, relevant log output, and sample query. The second template, added under the path .github/ISSUE_TEMPLATE/config.yml, is for configuration-related issues and includes support contact links for general Databricks questions and Remorph documentation, as well as fields for specifying the operating system and software version. A new issue template for feature requests, named "Feature Request", has also been added, providing a structured format for users to submit requests for new functionality for the Remorph project. These templates will help streamline the issue creation process, improve the quality of information provided, and make it easier for the development team to quickly identify and address bugs and feature requests.
Added Databricks Source Adapter (#185). In this release, the project has been enhanced with several new features for the Databricks Source Adapter. A new engine parameter has been added to the DataSource class, replacing the original source parameter. The _get_secrets and _get_table_or_query methods have been updated to use the engine parameter for key naming and handling queries with a select statement differently, respectively. A Databricks Source Adapter for Oracle databases has been introduced, which includes a new OracleDataSource class that provides functionality to connect to an Oracle database using JDBC. A Databricks Source Adapter for Snowflake has also been added, featuring the SnowflakeDataSource class that handles data reading and schema retrieval from Snowflake. The DatabricksDataSource class has been updated to handle data reading and schema retrieval from Databricks, including a new get_schema_query method that generates the query to fetch the schema based on the provided catalog and table name. Exception handling for reading data and fetching schema has been implemented for all new classes. These changes provide increased flexibility for working with various data sources, improved code maintainability, and better support for different use cases.
Added Threshold Query Builder (#188). In this release, the open-source library has added a Threshold Query Builder feature, which includes several changes to the existing functionality in the data source connector. A new import statement adds the re module for regular expressions, and new parameters have been added to the read_data and get_schema abstract methods. The _get_jdbc_reader_options method has been updated to accept a options parameter of type "JdbcReaderOptions", and a new static method, "_get_table_or_query", has been added to construct the table or query string based on provided parameters. Additionally, a new class, "QueryConfig", has been introduced in the "databricks.labs.remorph.reconcile" package to configure queries for data reconciliation tasks. A new abstract base class QueryBuilder has been added to the query_builder.py file, along with HashQueryBuilder and ThresholdQueryBuilder classes to construct SQL queries for generating hash values and selecting columns based on threshold values, transformation rules, and filtering conditions. These changes aim to enhance the functionality of the data source connector, add modularity, customizability, and reusability to the query builder, and improve data reconciliation tasks.
Added snowflake connector code (#177). In this release, the open-source library has been updated to add a Snowflake connector for data extraction and schema manipulation. The changes include the addition of the SnowflakeDataSource class, which is used to read data from Snowflake using PySpark, and has methods for getting the JDBC URL, reading data with and without JDBC reader options, getting the schema, and handling exceptions. These changes were completed by Ravikumar Thangaraj and SundarShankar89.
remorph reconcile baseline for Query Builder and Source Adapter for oracle as source (#150).

Dependency updates:

Bump sqlglot from 22.4.0 to 22.5.0 (#175).
Updated databricks-sdk requirement from <0.22,>=0.18 to >=0.18,<0.23 (#178).
Updated databricks-sdk requirement from <0.23,>=0.18 to >=0.18,<0.24 (#189).
Bump actions/checkout from 3 to 4 (#203).
Bump actions/setup-python from 4 to 5 (#201).
Bump codecov/codecov-action from 1 to 4 (#202).
Bump softprops/action-gh-release from 1 to 2 (#204).

Contributors: @dependabot[bot], @sundarshankar89, @ganeshdogiparthi-db, @vijaypavann-db, @bishwajit-db, @ravit-db, @nfx

Contributors

nfx, dependabot, and 5 other contributors

Assets 4

15 Mar 05:42

sundarshankar89

v0.1.5

6b1cbbc

v0.1.5

Added Pylint Checker (#149). This diff adds a Pylint checker to the project, which is used to enforce a consistent code style, identify potential bugs, and check for errors in the Python code. The configuration for Pylint includes various settings, such as a line length limit, the maximum number of arguments for a function, and the maximum number of lines in a module. Additionally, several plugins have been specified to load, which add additional checks and features to Pylint. The configuration also includes settings that customize the behavior of Pylint's naming conventions checks and handle various types of code constructs, such as exceptions, logging statements, and import statements. By using Pylint, the project can help ensure that its code is of high quality, easy to understand, and free of bugs. This diff includes changes to various files, such as cli.py, morph_status.py, validate.py, and several SQL-related files, to ensure that they adhere to the desired Pylint configuration and best practices for code quality and organization.
Fixed edge case where column name is same as alias name (#164). A recent commit has introduced fixes for edge cases related to conflicts between column names and alias names in SQL queries, addressing issues #164 and #130. The check_for_unsupported_lca function has been updated with two helper functions _find_aliases_in_select and _find_invalid_lca_in_window to detect aliases with the same name as a column in a SELECT expression and identify invalid Least Common Ancestors (LCAs) in window functions, respectively. The find_windows_in_select function has been refactored and renamed to _find_windows_in_select for improved code readability. The transpile and parse functions in the sql_transpiler.py file have been updated with try-except blocks to handle cases where a column name matches the alias name, preventing errors or exceptions such as ParseError, TokenError, and UnsupportedError. A new unit test, "test_query_with_same_alias_and_column_name", has been added to verify the fix, passing a SQL query with a subquery having a column alias ca_zip which is also used as a column name in the same query, confirming that the function correctly handles the scenario where a column name conflicts with an alias name.
TO_NUMBER without format edge case (#172). The TO_NUMBER without format edge case commit introduces changes to address an unsupported usage of the TO_NUMBER function in Databicks SQL dialect when the format parameter is not provided. The new implementation introduces constants PRECISION_CONST and SCALE_CONST (set to 38 and 0 respectively) as default values for precision and scale parameters. These changes ensure Databricks SQL dialect requirements are met by modifying the _to_number method to incorporate these constants. An UnsupportedError will now be raised when TO_NUMBER is called without a format parameter, improving error handling and ensuring users are aware of the required format parameter. Test cases have been added for TO_DECIMAL, TO_NUMERIC, and TO_NUMBER functions with format strings, covering cases where the format is taken from table columns. The commit also ensures that an error is raised when TO_DECIMAL is called without a format parameter.

Dependency updates:

Bump sqlglot from 21.2.1 to 22.0.1 (#152).
Bump sqlglot from 22.0.1 to 22.1.1 (#159).
Updated databricks-labs-blueprint[yaml] requirement from ~=0.2.3 to >=0.2.3,<0.4.0 (#162).
Bump sqlglot from 22.1.1 to 22.2.0 (#161).
Bump sqlglot from 22.2.0 to 22.2.1 (#163).
Updated databricks-sdk requirement from <0.21,>=0.18 to >=0.18,<0.22 (#168).
Bump sqlglot from 22.2.1 to 22.3.1 (#170).
Updated databricks-labs-blueprint[yaml] requirement from <0.4.0,>=0.2.3 to >=0.2.3,<0.5.0 (#171).
Bump sqlglot from 22.3.1 to 22.4.0 (#173).

Contributors: @dependabot[bot], @sundarshankar89, @bishwajit-db

Contributors

dependabot, sundarshankar89, and bishwajit-db

Assets 4

29 Feb 08:48

sundarshankar89

v0.1.4

80f1e67

v0.1.4

Added conversion logic for Try_to_Decimal without format (#142).
Identify Root Table for folder containing SQLs (#124).
Install Script (#106).
Integration Test Suite (#145).

Dependency updates:

Updated databricks-sdk requirement from <0.20,>=0.18 to >=0.18,<0.21 (#143).
Bump sqlglot from 21.0.0 to 21.1.2 (#137).
Bump sqlglot from 21.1.2 to 21.2.0 (#147).
Bump sqlglot from 21.2.0 to 21.2.1 (#148).

Contributors: @dependabot[bot], @nfx, @sundarshankar89, @vijaypavann-db, @derekyidd, @bishwajit-db

Contributors

nfx, dependabot, and 4 other contributors

Assets 4

Releases: databrickslabs/lakebridge

v0.10.3

Converter Improvements

General:

Bladebridge Converter

Reconcile improvements

Documentation updates

General

Contributors

Uh oh!

v0.10.2

Contributors

Uh oh!

Release v0.10.1

Lakebridge v0.10.1 release

Uh oh!

Release v0.10.0

🚀 Lakebridge v0.10.0 – The Bridge to Databricks Awaits! 🌉

✨ What's Inside Lakebridge v0.10.0?

🕵️ Pre-migration Assessment: Know Before You Go

🔄 SQL Conversion: Dialect Dilemmas, Solved

🧮 Post-migration Reconciliation: Trust, but Verify

🛠️ Installation: As Easy as Copy-Paste

🧑‍💻 Why Lakebridge?

💬 Get Involved!

Uh oh!

v0.9.1

Uh oh!

v0.9.0

Contributors

Uh oh!

v0.8.0

Contributors

Uh oh!

v0.1.6

Contributors

Uh oh!

v0.1.5

Contributors

Uh oh!

v0.1.4

Contributors

Uh oh!