Skip to content

Releases: databrickslabs/lakebridge

v0.10.13

28 Oct 04:02
51b2a05

Choose a tag to compare

Analyzer

  • Added defensive code to prevent analyzer crashes on DataStage files with empty array references - Fixes an issue where the DataStage analyzer would crash when encountering empty array references

Converters

Morpheus

General

  • Enhanced name representation consistency - Major refactoring that replaces String representations with Expression types for table names, column names, and constraints across IR nodes, improving SQL/PySpark code generation accuracy

  • Fixed DBT parsing issues - Resolved template parsing problems by changing template markers to !#Jinja0001#! format and improving whitespace handling for proper tokenization

TSQL (Synapse/SQL Server)

  • Support for dual OUTPUT clauses in TSQL INSERT/DELETE/UPDATE statements - Enhanced T-SQL parser to handle complex statements with multiple OUTPUT clauses (OUTPUT ... INTO ... OUTPUT ...) with comprehensive test coverage

  • Fixed TSQL DECLARE statement handling - Refactored DECLARE statement processing by moving logic to dedicated visitor methods and properly marking unsupported statements for future implementation

  • Improved BLOCK structure parsing for BEGIN and BEGIN TRY statements - Updated parser grammar to support flexible scripting blocks and transaction handling, allowing zero or more statements in control flow constructs

  • Added comprehensive USE statement support - Introduced new IR representations (UseCatalog, UseSchema) with dialect-specific AST building logic and proper SQL generation

Snowflake

  • Fixed Snowflake connection tests - Internal improvements for database connection test reliability

  • Added comprehensive USE statement support - Introduced new IR representations (UseCatalog, UseSchema) with dialect-specific AST building logic and proper SQL generation

BladeBridge

General

  • Automatically creates and cleans up temporary folders for embedded SQL conversion in wrapper scripts - Improves workflow management by implicitly creating temp folders and cleaning them up once conversion is complete

MSSQL (SQL Server)

  • Enhanced table variable and temporary table conversion - Added support for table variable conversion to temporary tables and improved string handling with logic to convert double single quotes to double quotes

  • Fixed semicolon placement in nested select statements - Resolved issue where semicolons appeared before comments in nested select statements

  • Improved MS SQL procedure handling - Added LIMIT 1 for Set in select statements, enhanced function mappings, fixed string concatenation, and removed unsupported constraints

Reconcile

No updates in this release.

Documentation

No updates in this release.

Contributors: @gueniai, @sundarshankar89

v0.10.12

16 Oct 21:11
2bf8292

Choose a tag to compare

Analyzer

  • New installation verification command - Introduced a new command to verify successful installation of the Lakebridge Analyzer, displaying usage and available flags for report file paths, source directories, and source technologies

Converters

General

  • Enhanced transpile command - Updated transpile command to support --overrides-path and --target-technology arguments for greater flexibility and customization

  • Improved error handling - Enhanced handling of parsing errors during code transpilation to output transpiled code instead of original input, providing clearer outcomes when issues arise

  • Refactored naming conventions - Renamed transpiler product_name to transpiler_id throughout the codebase for improved consistency and clarity

Morpheus

TSQL

  • Enhanced TSQL support - Added support for DENY statements, EXEC statement syntax improvements, COLLATION in CREATE TABLE column definitions, and WINDOW clause functionality

  • Improved ALTER DATABASE support - Enhanced support for all options on ALTER DATABASE SET statements and multiple LOG file specifications in ALTER DATABASE ADD LOG

  • Better JOIN functionality - Added support for all join hints (MERGE, HASH, LOOP, REDUCE, REPLICATE, REDISTRIBUTE) in JOIN constructs

  • Enhanced COPY INTO support - Fixed syntax for COPY INTO commands and added extended column definitions support in TSQL mode

  • Improved DELETE operations - Added transformation rule to translate IN to EXISTS when needed in DELETE statement WHERE clauses

Snowflake

  • COPY INTO improvements - Refactored and standardized grammar rules for COPY INTO commands, consolidating stage location handling

  • UPDATE FROM enhancements - Added tests for UPDATE FROM statements to verify correct transpilation to MERGE INTO statements

General

  • Enhanced permission handling - Added support for column-specific privileges and improved handling of column-specific permissions

  • Improved parser functionality - Allowed SCHEMAS keyword to be used as identifier and clarified warning messages for unrecognized functions

BladeBridge

MSSQL

  • Fixed update_to_merge functionality - Improved WITH clause handling and script variable ordering for MSSQL dialects

  • Table variable support - Implemented table variable conversion support for MSSQL dialects

  • DDL operation fixes - Fixed and removed unsupported DDL operations including alter index, switch partitions, and drop constraints

Informatica

  • Power Center improvements - Fixed hanging issue on Linux for Informatica PC conversion by improving block_subst patterns and output flushing

  • Dataframe implementation fixes - Fixed dataframe implementation for pulling data from flat file unconnected lookups in Informatica Power Center

DataStage

  • TRUNCATE TABLE support - Added spark.sql_template to resolve TRUNCATE TABLE statement generation when TRUNCATE flag is enabled in DataStage

Reconcile

  • Enhanced Databricks schema queries - Fixed Databricks schema query to improve accuracy and reliability of schema reconciliation, with better column name consistency and filtering

Documentation

  • Updated CLI documentation - Refreshed documentation to reflect latest changes in Command Line Interface menus, including new commands and flags such as transpile, reconcile, and install-transpile subcommands

  • Enhanced command documentation - Added detailed documentation for transpile command usage and flags, including optional flags for catalog name, error file path, and source dialect

  • Updated installation guides - Modified installation documentation to include verification examples and updated help flags for new command options
    Dependency updates:

  • Updated cryptography requirement from <45.1.0,>=44.0.2 to >=44.0.2,<46.1.0 (#2028).

  • Bump databrickslabs/sandbox/acceptance@acceptance/v0.4.2 from 0.4.2 to 0.4.4 (#1833).

Contributors: @asnare, @sundarshankar89, @m-abulazm, @dependabot[bot], @gueniai

v0.10.11

03 Oct 21:53
b8259a7

Choose a tag to compare

Analyzer

No updates in this release

Converters

General

  • Fixed special character handling in filenames by introducing from_uri() helper function for safer URI handling
  • Ensured SQL converter returns UTF-8 encoded files for proper character encoding
  • Fixed filename to correctly output databricks_conversion_supplements.py supplemental file
  • Fixed broken splitter URL by updating directory naming conventions from "Downloads" to "downloads"
  • Improved handling of encoding-related errors by catching UnicodeDecodeError and LookupError exceptions during file processing, creating TranspileError with specific encoding-error codes instead of stopping

Morpheus

Snowflake

  • Added support for TRUNCATE TABLE statements with proper IR and translation support
  • Correctly support $IDENTITY and $ROWGUID system variables
  • Refactored and extended grammar and AST support for SQL procedure creation with improved handling of raw string literals
  • Enhanced schema reconciliation functionality to support Snowflake arrays, addressing the corner case where Databricks arrays are typed and Snowflake arrays are untyped

TSQL

  • Added support for TRUNCATE TABLE statements with proper IR and translation support
  • Support full CREATE and ALTER INDEX statements in TSQL parsing, rejecting INDEX CREATE/ALTER statements sensibly instead of raising syntax errors
  • Fixed implementation of IF scripting blocks with improvements to SQL parser, grammar enhancements, and enhanced scripting grammar for more robust handling of block statements and conditional branches
  • Allow CLUSTERED to be an identifier to improve CREATE TABLE syntax as a CONSTRAINT qualifier
  • Support percentage expressions in TSQL options (e.g., OPT = 42%) instead of raising parsing errors
  • Added support for REVOKE statements, similar to existing GRANT statement implementation
  • Ensure that ROWS and OBJECTS can be used as identifiers even with Jinja templates
  • Correctly support $IDENTITY and $ROWGUID system variables

General (Multiple Dialects)

  • Support comments on column declarations when generating SQL and renamed legacy builders for consistency
  • Refactored IR around CREATE FUNCTION and CREATE PROCEDURE, unifying all ways to create stored procedures under a single CreateStoredProcedure IR node and all ways to create user defined functions under a single CreateUDF IR node
  • Implemented grammar and IR placeholders for named windows, introducing initial support for the SQL standard WINDOW clause in parser grammar

BladeBridge

Oracle

  • Removed unsupported Oracle DDL constraints (add/create constraint unique) and extraneous TBLPROPERTIES from converted output

MSSQL

  • Added handle_xml_nodes function for MS SQL processing
  • Fixed multiple MSSQL issues including CTEs in views/stored procedures, ADD CONSTRAINT problems, DEFAULT value handling, and parameter data types

Synapse

  • Fixed multiple Synapse issues including CTEs in views/stored procedures, ADD CONSTRAINT problems, DEFAULT value handling, parameter data types, error handling in stored procedures, and Synapse-specific features (e.g., table distribution)

Teradata

  • Added Teradata function mappings including ZEROIFNULL, TEMPORAL_TIMESTAMP, TRYCAST, ANY, FIRST, NULLIFZERO, DECODE with different parameter counts, and HASHAMP
  • Removed collect statistics and lock table statements

DataStage

  • Implemented DataStage Checksum component translation to SparkSQL equivalent and fixed Pyspark checksum translation to use MD5() instead of SHA2()

Reconcile

  • Added handling for special characters in reconcile aggregate, enhancing the library to handle special characters in column names by properly delimiting identifiers in SQL queries
  • Fixed deploy reconcile jobs by updating wheel file handling, simplifying deployment process to use single wheel path, and fixing broken documentation links

Documentation

  • Fixed download link in docs (reconcile automation) by replacing broken markdown link with JSX link utilizing useBaseUrl hook

General

  • Implemented new describe-transpile CLI subcommand that describes installed transpilers, including their versions, configuration paths, and supported source dialects
  • Switched from urllib to requests library for making HTTP calls to PyPI and Maven Central, with default 60-second timeout and improved error handling
  • Work around DATABRICKS_HOST normalization issue during install and uninstall by introducing new Lakebridge subclass with appropriate workspace client

Dependency updates

Special thanks to @BrianDeacon for his contribution to fix #1858

Contributors: @asnare, @m-abulazm, @ihor-ki, @goodwillpunning, @sundarshankar89, @dependabot[bot]

v0.10.10

25 Sep 03:56
3678301

Choose a tag to compare

Analyzer

  • Large XML file chunking optimization: Now the analyzer is able to handle large XML files (up to 1TB in size)

Converters

General

  • Non-interactive transpiler installation: Introduced support for non-interactive installation mode with new interactive option that can detect environment context, enabling automated installations without user input while preserving existing configurations. Resolves #2013

Morpheus

  • Enhanced GRANT statement support: Implemented comprehensive GRANT statement support by creating dedicated permission.g4 grammar file with IR definitions and translation rules for permission-related statements

  • Improved error handling: Rewrote print function to properly handle newlines and added extensive unit tests for error annotation, including block and FIXME comments. Resolves #2030

  • Enhanced LSP server behavior: Improved LSP server to append original text to error messages when transpilation fails, eliminating need for client-side response manipulation

  • Standardized dialect options: Aligned dialect options to present synapse and mssql to users for consistency with bladebridge

  • Fixed Lateral Column Alias handling: Enhanced dealiasing for Lateral Column Aliases (LCAs) in WHERE clauses under CASE...WHEN expressions. Resolves #1767

  • Enhanced GROUP BY/aggregation function dealiasing: Implemented dealiasing for Lateral Column Aliases in GROUP BY clauses and aggregation functions where LCA references are unsupported. Resolves (#956) and (#954)

  • Optimized Snowflake transformations: Reordered transformation rules to ensure TransformWithinGroup processes all cases before the call mapper. Resolves #1231

BladeBridge

  • Enhanced merge statement handlers: Improved merge statement processing to fix backtick handling, update operations without WHERE clauses, procedure conversions, IF-THEN-SET blocks, and various delimiter and mapping issues

  • Fixed view creation with WITH clauses: Corrected CREATE VIEW functionality to properly handle WITH clause statements

  • Oracle script improvements: Resolved variable declaration issues in Oracle scripts containing exception handling blocks

  • SQL Server function mapping: Added function mappings for Microsoft SQL Server functions including GETUTCDATE, IS_MEMBER, SERVERPROPERTY variants, and QUOTENAME with one or two arguments

  • Fixed variable declarations: Resolved variable declaration issues in Oracle scripts that contain exception handling blocks

  • MSSQL Server Enhanced function mappings: Added comprehensive function mappings including GETUTCDATE, IS_MEMBER, SERVERPROPERTY variants, and QUOTENAME with one or two arguments

Reconcile

  • Improved logging for aggregate reconciliation: Enhanced logging functionality to provide more accurate messages by replacing warning logs with informational messages when aggregate details rules are empty, indicating successful reconciliation with no details to store. Resolves #2040

  • Refactored aggregate query building: Simplified code using AggregateQueryBuilder class to generate queries for both source and target in a more concise and efficient manner

Documentation

No updates in this release

Dependency updates:

  • Bump actions/setup-python from 5 to 6 (#1988).

Contributors: @m-abulazm, @asnare, @dependabot[bot], @sundarshankar89

v0.10.9

12 Sep 20:58
9332501

Choose a tag to compare

Analyzer

  • Fixed bug where Analyzer would crash with large DDL files
  • Adjusted calculation of complexity for TSQL queries to make it more accurate

Transpilers

Morpheus

  • T-SQL Updates

    • Advanced Statement Support: Added parsing for CREATE CERTIFICATECREATE LOGINPRINT commands, and EXECUTE AS LOGIN statements
    • SET Command Enhancements: Support for complex assignment operators (+=-=*=/=%=&=^=|=) commonly used in T-SQL scripts
    • CREATE EXTERNAL TABLE: Improved parsing with flexible syntax for external table definitions and location specifications
    • GRANT/REVOKE Statements: Comprehensive support for T-SQL security statements with clear Unity Catalog migration guidance
    • DROP Commands: Enhanced handling of DROP SENSITIVITY and other specialized DROP variants
    • Improved Error Reporting: SQL output now includes FIXME comments with detailed error messages for unsupported constructs
  • Snowflake Updates

    • Analytics Functions: Full parsing support for MATCH_RECOGNIZE clause with pattern analysis capabilities for complex analytical queries
    • Time Travel Queries: Enhanced handling of CHANGESAT, and BEFORE clauses for historical data access patterns
    • REGEXP_INSTR Function: Complete implementation supporting all 7 parameters (vs Databricks' 2), providing accurate behavioral translation
    • Table-Valued Functions: Support for parsing inline table-valued functions commonly used in Snowflake
    • GRANT/REVOKE Statements: Full support for Snowflake's complex privilege management syntax including roles and shares
    • DROP Commands: Enhanced parsing for DROP SENSITIVITY and related data governance statements
    • Improved Error Reporting: SQL output now includes FIXME comments with detailed error messages for unsupported constructs

Dependency updates:

  • Bump actions/checkout from 4 to 5 (#1928).
  • Bump actions/upload-pages-artifact from 3 to 4 (#1964).
  • Bump mermaid from 11.6.0 to 11.10.1 in /docs/lakebridge (#1956).

Contributors: @dependabot[bot], @asnare, @m-abulazm

v0.10.8

09 Sep 03:54
4c377cd

Choose a tag to compare

Transpilers

General

  • SQL Validation Enhancement: Improved SQL validator to check only SQL outputs with enhanced error handling and support for various transpile results (#1949)
  • Error Handling Improvements: Added static error lookups for specific cases like unresolved routines and columns, with more readable exception messages
  • MIME Support: New functionality to support both MIME and non-MIME transpile results, including validation and output file management
  • LSP Server Integration: Log level now passed to Language Server Protocol (LSP) server via environment variable for greater flexibility (#1967)
  • Transpiler Auto-Upgrade: Enhanced installer to automatically upgrade existing Lakebridge transpilers during CLI upgrade process (#1978)
  • Source Dialect Handling: Fixed missing transpile source dialect handling to ensure correct assignment in configuration objects (#1985)

Morpheus

  • Enhanced Snowflake Conversion support:

    • Support for parsing ILIKE, EXCLUDE, REPLACE, RENAME with * LHS
    • Full support for EXCLUDE and RENAME clauses and all combinations
    • Fixed REPLACE function with optional third argument
    • Enhanced OBJECT_DELETE to accept 2 or more arguments
    • Accurate translation of Snowflake's REGEXP_REPLACE
  • Parser Improvements:

    • Allow lists of generic options with optional commas
    • EXTERNAL can now be used as an ID despite being documented as reserved
    • Support for DROP RULE syntax in TSQL
    • Allow DBT Jinja macros within JSON literals
    • Fixed bugs around DBT elseif and comment nodes
  • Error Handling: Upgraded SimpleError with support status and simplified user-facing parse error messages

  • Integration Alignment: Updated error handling to align with BladeBridge, now returning UNRESOLVED_ROUTINE errors consistently (#1998)

BladeBridge

  • XML Source Processing:

    • Automatic detection of XML sources with proper encoding preservation
    • Maintains UTF-8 encoding while respecting XML-specific encoding declarations
    • Prevents XML parser failures from encoding mismatches
  • SQL Scripting Enhancements:

    • Fixed nested comment handling in SQL scripts
    • Improved custom configuration handling for first-match processing
    • Removed unnecessary begin/end enclosures in pre/post SQL blocks
  • Teradata Updates: Enhanced convert_update_to_merge functionality

  • Oracle Updates:

    • Replaced list partitioning with CLUSTER BY statements
    • Removed unsupported CREATE INDEX and ALTER INDEX statements
    • Fixed CREATE PROCEDURE signature generation with proper exception handling
  • DataStage Updates:

    • Added support for TRUNCATE TABLE specifications (#1903)
    • Fixed column name handling when dataframe columns match job parameters
    • Enabled single-pass processing of shared containers
    • Resolved dataset component path issues for proper PySpark code generation

Reconcile

  • Schema Normalization: Added feature flag for identifier normalization with optional normalize parameter in get_schema method for flexible handling of different data source configurations (#1953)

Enhanced Connection Support

  • Snowflake Security: Added support for encrypted PEM private keys with pem_private_key_password field for secure authentication (#1869)
  • JDBC URL Handling: Improved JDBC URL arguments handling with enhanced error handling and logging
  • Connection Properties: Enhanced SecretsMixin class with new _get_secret_or_none method for better secret value retrieval
  • Error Handling: Introduced new exceptions like InvalidSnowflakePemPrivateKey for better error management

Documentation

Comprehensive Documentation Updates

  • MS SQL and Synapse: Enhanced documentation for reconcile connections including default secret naming conventions and required connection properties (#1954)
  • Connection Configuration: Added clear YAML format examples for MS SQL connection properties covering user, password, host, port, database, encryption, and trust server certificate
  • BladeBridge Updates: Minor naming correction from "Microsoft MS SQL Server" to "Microsoft SQL Server" while maintaining support for Oracle, Teradata, Netezza, Informatica, and DataStage
  • SQL Splitter: Updated documentation to remove RCT references, relocated to main menu with revised terminology using "Lakebridge" consistently (#1952)
  • Transpiler Discovery: Updated documentation for pluggable transpiler discovery and execution, introducing Morpheus and BladeBridge as Databricks-provided transpilers
  • Installation Process: Updated installation processes from Maven Central and PyPi with new directory structure for manual installations

General

Installation and Maintenance Improvements

  • Automated Upgrades: Streamlined installation process with automatic transpiler upgrades during CLI upgrade, eliminating need for separate upgrade commands
  • Plugin Management: Improved installation process for plugins like Bladebridge and Morpheus
  • Testing Enhancement: Added comprehensive test functions to validate SQL file transpilation with various scenarios including table creation and error handling

Contributors: @m-abulazm, @asnare, @sundarshankar89, @goodwillpunning, @gueniai

v0.10.7

21 Aug 19:02
9ea1d2c

Choose a tag to compare

Analyzer

  • Improved CLI argument handling and validation - The analyzer's execution process has been significantly enhanced to improve flexibility and user experience. The analyzer now accepts a folder path and source technology type as inputs, generating an Excel report with analysis results for all files and subfolders. The command-line interface has been updated with optional arguments for source directory, report file, and source technology, with interactive prompts to guide users through the analysis process. Enhanced validation includes checks for input folder existence and write access validation for output locations, with better handling of files from cloud-sync folders (#1901).
  • Enhanced Informatica analyzer - Improvements have been made to the Informatica analyzer as part of broader project enhancements.

Transpilers

General

  • Consistent terminology updates - Updated to use mssql and synapse consistently throughout the codebase and documentation. The ReconcileConfig class and ReconSourceType class have been updated to reflect consistent terminology, with supported data sources now including "mssql", "synapse", "snowflake", "teradata", "oracle", and "databricks" (#1950).
  • Enhanced transpiler detection during installation - The installation process now detects existing transpilers and notifies users if upgrades are needed, providing appropriate commands and guidance. Enhanced logging and user agent configuration improve the overall installation experience (#1917).
  • Fixed transpiler backup handling - Improved install-transpile process to handle cases where transpiler backups already exist, introducing a new context manager for preserving and restoring paths with better error handling and reliability (#1893).

Morpheus

  • Fixed parsing issues with row access policies - Resolved parsing problems with row access policies containing dot-qualified names like AMC_TEC.RAP_CONT_AREA while maintaining proper error handling for unsupported Databricks SQL features.
  • Added comprehensive test coverage for CREATE VIEW statements - Enhanced test coverage for CREATE VIEW statements with ROW ACCESS POLICY clauses to ensure proper validation and error handling.
  • Fixed randomization function translation - Corrected translation between Snowflake's RANDOM() (64-bit integer) and Databricks' RAND() (double) with proper seed handling and deterministic behavior.
  • Enhanced temporal format translation - Improved TO_CHAR/TO_VARCHAR functions with automatic format conversion and TO_CHAR as TO_VARCHAR synonym support.

BladeBridge

Oracle

  • Outer Join Conversion: Disabled the call to the subroutine responsible for Oracle outer join conversion due to invalid UNION SELECT syntax in PySpark outputs.
  • Procedure Call Handling: New logic was added for generating ETL procedure calls, aligning with Oracle transformation run controls.

Informatica PowerCenter

  • PySpark Output Improvements:
    • Added handling for pre/post source/target stored procedure calls.
    • Removed the pyspark_data_action column from target writing.
    • Improved mapping script generation, now automatically generating mapplets alongside mappings. Mapplet implementation files are placed in a dedicated shared_functions subfolder, and mapping scripts incorporate correct import statements for mapplet dependencies.
    • The converter now returns supplemental files (e.g., DatabricksConversionSupplements.py).
  • Notebook Header: When converting from DataStage or Informatica PowerCenter to PySpark or SparkSQL, the output now begins with # Databricks notebook source for compatibility with Databricks notebook import.

DataStage

  • Square Brackets Conversion: Changed logic so SQL statements with square brackets are now replaced with backticks for Databricks compatibility.
  • Notebook Header: The PySpark/SparkSQL output now starts with the standard Databricks notebook header for DataStage-to-notebook conversion.

General SQL/Databricks Compatibility

  • Table/Column Name Sanitization: Configuration has been added to replace unsupported Databricks characters (,;{}()\n\t=) in table and column names with a valid character, defaulting to underscore.
  • DELETE Statement Conversion: Fixed an endless loop caused by previous DELETE conversion rules, and updated logic so DELETE operations are now properly converted to MERGE statements.
  • UPDATE Statement Enhancements:
    • Updates without a FROM clause are identified and safely converted to MERGE statements.
    • Improved handler logic now marks fragments with FROM/MERGE clauses as examined, adding more programmatic safety checks versus relying on regex only.
    • Additional patterns added to accurately convert various updates into MERGE.
    • Enhanced support for nested IN clauses in WHERE conditions on UPDATE statements, converting them into joins and then merging where appropriate.
  • Sub-Selects in MERGE: Fixed handling for sub-selects within MERGE statements (e.g., EXISTS (select ...)), with temporary removal of comments for those cases.

Reconcile

  • Terminology standardization - Updated Lakebridge Recon tool documentation to replace remorph with lakebridge throughout, including catalog and schema names, table creation, links, and references to notebooks. Configuration documentation updated to reflect the config file location requirement in the .lakebridge directory within the Databricks Workspace (#1876).

Documentation

  • SQL Splitter utility documentation - Added comprehensive documentation for the SQL Splitter utility, which facilitates processing of large SQL files by splitting them into individual files (one object per file). The tool supports stored procedures, functions, tables, views, and Oracle packages, and is available as a downloadable ZIP file containing executables for Windows, Linux, and MacOS (#1926).
  • Consistent terminology updates - Updated documentation to use "MS SQL Server (incl. Synapse)" instead of "SQL Server (incl. Synapse)" and replaced TSQL with "MSSQL" for consistency (#1950).

Contributors: @asnare, @m-abulazm, @sundarshankar89, @andresgarciaf, @simone-dbx-labs

Release v0.10.6

01 Aug 21:55
91ee879

Choose a tag to compare

Analyzer

  • Informatica Workflow Variable Collection
    The Informatica analyzer now collects workflow variables, enhancing downstream conversion and mapping flows.

Converters improvements

Morpheus

  • Expanded SQL parser for Snowflake supports full IF...ELSEIF...ELSE and ELSE IF constructs, recognizes ELSEIF as a keyword, and strengthens test coverage.

  • Improvements in Snowflake CREATE PROCEDURE grammar, including: simplified syntax, handling of optional queries, result set variables, and better exception handling.

  • Support for TEMPORARY as an interchangeable keyword for temporary objects in Snowflake parsing.

BladeBridge

  • Enhanced SQL Scripting for Oracle Procedures
    Multiple fixes for procedure conversion, including quoted identifiers, Japanese character support, misplaced/duplicated keywords, improved SELECT INTO, and more.

  • Datastage PXPivot Conversion
    Datastage’s vertical pivot (PXPivot) can now be converted to Databricks SQL, broadening ETL migration.

  • Synapse and MS SQL Configuration Improvements

    • Enhanced fragment breaker for standalone SELECTs
    • Improved logic and ordering for variable declarations and set operators
    • Bugfixes for PROC_FINISH, WITH statement handling, and universal ETL+SQL testing
  • Overrides-file Prompt Update
    More descriptive and clear prompt for the ‘overrides-file’ option in CLI and documentation.

  • Bug Fixes & Minor Enhancements

    • Datastage IF/THEN/ELSE and header row handling improvements
    • TRY/CATCH and improved SELECT INTO #table conversion
    • Better handling of set operations in SELECT/WITH
    • Standardized JSON configuration naming: now uses base_<source>2databricks_<sql|sparksql|pyspark>.json
    • DELETE-to-MERGE conversion, more tests, correct semicolon placement, and expanded handling of SQL scripting features

Documentation

  • Significant Improvements:
    • Expanded BladeBridge and overall configuration docs, with clear instructions for extending logic, using overrides, managing outputs, and troubleshooting.
    • Updated guide on reconciling config and leveraging new CLI options.

General

  • Security & Infrastructure Enhancements

    • Addressed CVE-2025-7339 (HTTP header manipulation vulnerability) by updating the on-headers dependency.
    • Refined handling of output folders, error files, and configuration management for reliability.
    • Improved reconcile dashboard deployment reliability—folders without a dashboard.yml are no longer deployed.
    • Suppressed spurious warnings on initial installation; only debug messages are now logged for clean setups.
    • Improved encoding handling and end-to-end test coverage for non-UTF-8 files and edge-case encodings.

Contributors: @asnare, @sundarshankar89, @gueniai, @vijaypavann-db, @bishwajit-db, @simone-dbx-labs

v0.10.5

16 Jul 17:02
031d8fe

Choose a tag to compare

Converters improvements

General

  • XML Encoding Support: The _process_one_file function now detects and correctly handles XML files with internally-specified encoding (e.g., Windows-1252), ensuring successful parsing and conversion of non-UTF-8 files in transformation pipelines. [#1828]

  • Test Enhancements: Updates to test cases (test_transpiles_informatica_with_sparksql, test_transpiles_all_dbt_project_files) were made to increase reliability and provide better logging. [#1828]

Morpheus transpiler

  • Temporary and Transient Table Support Across Dialects:

    • Adds parsing and SQL generation for TEMPORARY, TRANSIENT, VOLATILE, and other table types.
    • Databricks currently treats TRANSIENT tables as TEMPORARY (still in private preview); READ ONLY not yet supported.
  • Enhanced Support for T-SQL SET Statement Options:

    • Parsers now recognize SET OPTION ON|OFF and generate structured error messages for unsupported options.
    • Adds support for finer-grained parsing of T-SQL options like SET ANSI_NULLS, SET ARITHABORT, etc.
  • Fix: CTEs in Subqueries:

    • Corrects issue where WITH clauses inside DDLs (e.g. CREATE TABLE AS) were previously ignored by not invoking the correct visitor.
  • IR Refinement for CREATE Commands:

    • Introduces a new CreateCommand node to better mirror SQL grammar, consolidating and simplifying previous IR structures (e.g., removing ReplaceTable and ReplaceTableAsSelect)
  • CREATE VIEW Implementation:

    • Implements the createView grammar and logic with visitor methods and meaningful error messages for unsupported options.

BladeBridge Transpiler

  • UPDATE to MERGE Logic:

    • Conversion logic for UPDATE...FROM to MERGE implemented
    • Post-processing Improvements: convert_update_to_merge function now ensures statement termination by checking for trailing semicolons.
  • Oracle Data Type Mapping Fixes:

    • NUMBER without precision now maps to DECIMAL(38,18) instead of DECIMAL(10,0).
    • Corrects Timestamp mapping and converts Char(length) to STRING.
    • SYSTIMESTAMP is now translated to CURRENT_TIMESTAMP()
  • Datastage SET VARIABLE Handling:

    • Updates SET VARIABLE component transformation to behave like standard column expressions and prepends SELECT as required.

Reconcile Improvements

  • Use of Existing Warehouse During Configure-Reconcile:
    • The reconcile configuration now checks for an existing warehouse_id in the user's Databricks config.
    • If present, it uses the existing SQL warehouse (with CAN_USE permission) instead of creating a new one.
    • Logs warehouse details and defers deletion for reusability. [#1825]

Documentation updates

  • Databricks Auth Profiles and --profile Option:

    • Users can now specify which Databricks workspace to use with the --profile flag during installation.
    • Adds command to list available profiles. [#1813]
  • Export Instructions for Microsoft SQL Server and Azure Synapse:

    • Step-by-step guides added for extracting view, table, and procedure DDLs using:
      • SQL Server Management Studio (SSMS),
      • Azure Synapse Studio,
      • PowerShell via Export-AzSynapseSqlScript for Synapse Serverless.
    • Screenshots and Microsoft documentation links included. [#1812]

Dependency Updates:

- Updated `databricks-labs-blueprint` version.
- Added `pytest-timeout` for improved test reliability. [[#1828]](https://github.com/databrickslabs/lakebridge/issues/1828)

Contributors: @eri-adepoju, @sundarshankar89, @asnare, @biswadeepupadhyay-db

v0.10.4

07 Jul 18:24
bc1a518

Choose a tag to compare

  • Added Source Tech Override for Analyzer (#1806). The Analyzer command has been enhanced with a source-tech flag, allowing users to specify the Source System Technology to analyze directly in the command line call.
  • Patch user agent for Infa (#1807). Improved user agent handling for dialects with spaces and added Informatica PC support.

Contributors: @sundarshankar89, @asnare