v0.56.0
- Added documentation to use Delta Live Tables migration (#3587). In this documentation update, we introduce a new section for migrating Delta Live Table pipelines to the Unity Catalog as part of the migration process. This workflow allows for the original and cloned pipelines to run independently after the cloned pipeline reaches the
RUNNINGstate. The update includes an example of stopping and renaming an existing HMS DLT pipeline, and creating a new cloned pipeline. Additionally, known issues and limitations are outlined, such as supported streaming sources, maintenance pausing, and querying by timestamp. To streamline the migration process, themigrate-dlt-pipelinescommand is introduced with optional parameters for including or excluding specific pipeline IDs. This feature is intended for developers and administrators managing data pipelines and handling table aliasing issues. Relevant user documentation has been added and the changes have been manually tested. - Added support for MSSQL and POSTGRESQL to HMS Federation (#3701). In this enhancement, the open-source library now supports Microsoft SQL Server (MSSQL) and PostgreSQL databases in the Hive Metastore Federation (HMS Federation) feature. This update introduces classes for handling external Hive Metastore instances and their versions, and refactors a regex pattern for better support of various JDBC URL formats. A new
supported_databases_portclass variable is added to map supported databases to default ports, allowing the code to handle SQL Server's distinct default port. Additionally, asupported_hms_versionsclass variable is created, outlining supported Hive Metastore versions. The_external_hmsmethod is updated to extract HMS version information more accurately, and the_split_jdbc_urlmethod is refactored for better URL format compatibility and parameter extraction. The test filetest_federation.pyhas been updated with new unit tests for external catalog creation with MSSQL and PostgreSQL, further enhancing compatibility with various databases and expanding HMS Federation's capabilities. - Added the CLI command for migrating DLT pipelines (#3579). A new CLI command, "migrate-dlt-pipelines," has been added for migrating DLT pipelines from HMS to UC using the DLT Migration API. This command allows users to include or exclude specific pipeline IDs during migration using the
--include-pipeline-idsand--exclude-pipeline-idsflags, respectively. The change impacts thePipelinesMigratorclass, which has been updated to accept and use these new parameters. Currently, there is no information available about testing, but the changes are expected to be manually tested and accompanied by corresponding unit and integration tests in the future. The changes are isolated to thePipelinesMigratorclass and related functionality, with no impact on existing methods or functionality. - Addressed Bug with Dashboard migration (#3663). In this release, the
_crawlmethod indashboards.pyhas been enhanced to exclude SDK dashboards that lack IDs during the dashboard migration process. This modification enhances migration efficiency by avoiding unnecessary processing of incomplete dashboards. Additionally, the_list_dashboardsmethod now includes a check for dashboards with no IDs while iterating through thedashboards_iterator. If a dashboard with no ID is found, the method fetches the dashboard details using the_get_dashboardmethod and adds them to thedashboardslist, ensuring proper processing. Furthermore, a bug fix for issue #3663 has been implemented in theRedashDashboardCrawlerclass inassessment/test_dashboards.py. Thegetmethod has been added as a side effect to theWorkspaceClientmock'sdashboardsattribute, enabling the retrieval of individual dashboard objects by their IDs. This modification ensures that theRedashDashboardCrawlercan correctly retrieve and process dashboard objects from theWorkspaceClientmock, preventing errors due to missing dashboard objects. - Broaden safe read text caught exception scope (#3705). In this release, the
safe_read_textfunction has been enhanced to handle a broader range of exceptions that may occur while reading a text file, includingOSErrorandUnicodeError, making it more robust and safe. The function previously caught specific exceptions such asFileNotFoundError,UnicodeDecodeError, andPermissionError. Additionally, the codebase has been improved with updated unit tests, ensuring that the new functionality works correctly. The linting parts of the code have also been updated, enhancing the readability and maintainability of the project for other software engineers. A new method,safe_read_text, has been added to thesource_codemodule, with several new test cases designed to ensure that the method handles edge cases correctly, such as when the file does not exist, when the path is a directory, or when an OSError occurs. These changes make the open-source library more reliable and robust for various use cases. - Case sensitive/insensitive table validation (#3580). In this release, the library has been updated to enable more flexible and customizable metadata comparison for tables. A case sensitive flag has been introduced for metadata comparison, which allows for consideration or ignoring of column name case during validation. The
TableMetadataRetrieverabstract base class now includes a new parametercolumn_name_transformerin theget_metadatamethod, which is a callable that can be used to transform column names as needed for comparison. Additionally, a newcase_sensitiveparameter has been added to theStandardSchemaComparatorconstructor to determine whether column names should be compared case sensitively or not. A new parametrized test functiontest_schema_comparison_casehas also been included to ensure that this functionality works as expected. These changes provide users with more control over the metadata comparison process and improve the library's handling of cases where column names in the source and target tables may have different cases. - Catch
AttributeErrorinInfferedValue._safe_infer_internal(#3684). In this release, we have implemented a change to the_safe_infer_internalmethod in theInferredValueclass to catchAttributeError. This change addresses an issue in the Astroid library reported in their GitHub repository (pylint-dev/astroid#2683) and resolves issue #3659 in our project. By handlingAttributeErrorduring the inference process, we have made the code more robust and safer. When an exception occurs, an error message is logged with debug-level logging, and the method yields theUninferablesentinel value to indicate that inference failed for the node. This enhancement strengthens the source code linting code through value inference in our open-source library. - Document to run
validate-groups-membershipbefore groups migration, not after (#3631). In this release, we have updated the order of executing thevalidate-groups-membershipcommand in the group migration process. Previously, the command was recommended to be run after the groups migration, but it has been updated to be executed before the migration. This change ensures that the groups have the correct membership and the number of groups and users in the workspace and account are the same before migration, providing an extra level of safety. Additionally, we have updated theremove-workspace-local-backup-groupscommand to remove workspace-level backup groups and their permissions only after confirming the successful migration of all groups. We have also updated the spelling of thevalidate-group-membershipcommand tovalidate-groups-membershipin a documentation file. This release is aimed at software engineers who are adopting the project and looking to migrate their groups to the account level. - Extend code migration progress documentation (#3588). In this documentation update, we have added two new sections,
Code Migrationand "Final details," to the open-source library's migration process documentation. TheCode Migrationsection provides a detailed walkthrough of the steps to migrate code after completing table migration and data reconciliation, including using the linter to investigate compatibility issues and linted workspace resources. The "linter advices" provide codes and messages on detected issues and resolution methods. The migrated code can then be prioritized and tracked using themigration-progressdashboard, and migrated using themigrate-commands. TheFinal detailssection outlines the steps to take once code migration is complete, including running thecluster-remapcommand to remap clusters to be Unity Catalog compatible. This update resolves issue #2231 and includes updated user documentation, with new methods for linting and migrating local code, managing dashboard migrations, and syncing workspace information. Additional commands for creating and validating table mappings, migrating locations, and assigning metastores are also included, with the aim of improving the code migration process by providing more detailed documentation and new commands for managing the migration. - Fixed Skip/Unskip schema functionality (#3567). In this release, we have addressed the improper handling of skip/unskip schema functionality in our open-source library. The
skip_schemaandunskip_schemamethods in themapping.pyfile have been updated to include thehive_metastoreschema prefix while setting or unsetting the database property that determines whether a schema should be skipped. Additionally, the_get_database_in_scope_taskand_get_table_in_scope_taskmethods have been modified to parse table properties as a dictionary, allowing for more straightforward lookup of the skip property for a table. Thetest_skip_with_schemaandtest_unskip_with_schemamethods in thetests/unit/test_cli.pyfile have also been updated. Thetest_skip_with_schemamethod now includes the catalog namehive_metastorein theALTER SCHEMAstatement, ensuring that the schema is properly skipped. Thetest_unskip_with_schemamethod has been modified to use theSET DBPROPERTIESstatement to set the value of thedatabricks.labs.ucx.skipproperty tofalse, effectively unskipping the schema. Furthermore, theexecutemethod in thesbemodule and the queries in themock_backendmodule have been updated to match the new commands. These changes address the issue of improperly skipping schemas and ensure that the code functions as intended, allowing users to skip and unskip schemas as needed. Overall, these modifications improve the reliability and correctness of the skip/unskip schema functionality, ensuring that it behaves as expected in different scenarios. - Fixed
Total Tableswidget in assessment to only show table counts (#3738). In this release, we have addressed the issue with theTotal Tableswidget in the assessment dashboard as part of resolving #3738 and in relation to #3252. The revised00_3_count_total_tables.sqlquery in thesrc/databricks/labs/ucx/queries/assessment/main/directory now includes a WHERE clause to filter out views from the table count query. By excluding views and only displaying table counts in theTotal Tableswidget, the scope of changes is limited to the SQL query itself. The diff reflects the addition of the WHERE clause and necessary indentation. The commit has been manually tested as part of our quality assurance process, and the successful test results are documented in theTestssection of the commit message. - Fixed broken anchor for doc release (#3720). In this release, we have developed and implemented fixes to address issues with the Databricks workflows documentation used in the migration process. The previous version contained a broken anchor reference for the workflow process, which has now been corrected. This improvement includes the addition of a manual test to verify the fix. The revised documentation enables users to view the status of deployed workflows and rerun failed workflows using the
workflowsandrepair-runcommands, respectively. These updates simplify the management and troubleshooting of workflows, enhancing the overall user experience. - Fixed broken anchors in documentation (#3712). In this release, we have made significant improvements to the UCX process documentation, addressing issues related to broken anchors, outdated command names, and syntax. The commands
enable_hms_federationandcreate_federated_cataloghave been renamed toenable-hms-federationandcreate-federated-catalog, respectively. These updates include corresponding changes to the command syntax and have been manually tested to ensure accuracy. Additionally, we have added a new command,validate-groups-membership, which can be executed prior to the group migration workflow for added confidence. In case of no matching account group in the UCX-installed workspace, thecreate-account-groupscommand is now available. This release also includes updates to the section titles and links to enhance clarity and reflect current functionality. - Fixed notebook sources with
NotebookLinter.apply(#3693). A newGithub.pyfile has been added to thedatabricks/labs/ucx/directory, providing functionality for working with GitHub issues. It includes anIssueTypeenum, aconstruct_new_issue_urlfunction, and constants for constructing URLs to the documentation and GitHub repository. TheNotebookLinterclass has been updated to include notebook fixing functionality, and thePythonLinterclass has been introduced to runapplyon an Abstract Syntax Tree (AST) tree. TheNotebook.applymethod has been implemented to apply changes to notebook sources and the legacyNotebookMigratorhas been removed. These changes also include various unit and integration tests and modifications to the existingdatabricks labs ucx migrate-local-codecommand. TheDOCS_URLmethod has been added to thedatabricks.labs.ucx.githubmodule, and the error message for external metastore connectivity issues now includes a link to the UCX installation instruction in the documentation. - Fixed the broken documentation links in dashboards (#3726). This revision updates documentation links in various dashboards to correct broken links and enhance the user experience. Specifically, it addresses issues #3725 and #3726 by updating links in the "Assessment Overview," "Assessment Summary," and
Compute summarydashboards, as well as thegroup migrationandtable upgradedocumentation. The changes include replacing local Markdown file links with online documentation links and updating links to point to the correct documentation sections in the UCX GitHub repository. Although the changes have been manually tested, no unit or integration tests have been added, and staging environment verification has not been performed. Despite this, the revisions ensure accurate and up-to-date documentation links, improving the usability of the dashboards. - Force
MaybeDependencyto have aDependencyORlist[Problem], not neither nor both (#3635). This commit enforces theMaybeDependencyobject to have either aDependencyor alist[Problem], but not neither or both, in order to handle known libraries during import registration. It resolves issue #3585, breaks up issue #3626, and progresses issue #1527, while modifying code linting logic and updating unit tests to accommodate these changes. Specifically, new classes likeKnownLoader,KnownDependency, andKnownProblemhave been introduced, and the_resolve_allow_listmethod has been updated to reflect the new enforcement. Additionally, tests have been added and modified to ensure the correct behavior of the modified logic, with a focus on handling directories, resolving children in context, and detecting known problems in imported libraries. - HMS Federation Documentation (#3688). The HMS Federation feature allows Hive Metastore (HMS) to be federated to a catalog, acting as a step towards migrating to Unity Catalog or as a hybrid solution where both HMS and UC access to the data is required. This feature provides an alternative to the table migration process, eliminating the need for table mapping, creating catalogs and schemas, and migrating Hive metastore data objects. The
enable_hms_federationcommand enables the Hive Metastore federation process, while thecreate_federated_catalogcommand creates a UC catalog that mirrors all the schemas and tables in the source Hive Metastore. Themigrate-glue-credentialscommand, which is AWS-only, creates a UC Service Credential for GLUE. These new commands are documented in the HMS Federation Documentation section and are now part of the migration process documentation with the data reconciliation step following it. To enable HMS Federation, use theenable-hms-federationandcreate-federated-catalogcommands. - Make
MaybeTreethe main Python AST entrypoint for constructing the syntax tree (#3550). In this release, the main entry point for constructing the Python AST syntax tree has been changed fromTreetoMaybeTreein the open-source library. This change involves moving class methods and static methods that construct aMaybeTreefrom theTreeclass to theMaybeTreeclass, and making the class method that normalizes the source code before parsing the only entry point. Thenormalized_parsemethod has been renamed tofrom_source_codeto match the commonly used naming for class methods within UCX. Thewalkandfirst_statementmethods have been removed fromMaybeTreeas they were repetitions fromTree's methods. These changes aim to enforce normalization and improve code consistency. Additionally, unit tests have been added and the Python linting related code has been modified to work with the newMaybeTreeclass. This change resolves issues #3457 and #3213. - Make fixer diagnostic codes unique (#3582). This commit modifies the
databricks labs ucx migrate-local-codecommand to make fixer diagnostic codes unique, ensuring accurate code migration and fixing. Two new methods have been added for modifying and adding unit and integration tests. Diagnostic codes for thetable-migrated-to-ucissue are now unique depending on the context where the table is referenced: SQL, Python, or Python-SQL. This ensures the appropriate fixer is applied when addressing code migration issues, improving overall functionality and user experience. Additionally, the commit updates the documentation to include the new postfixes for thetable-migrated-to-uclinter code and their descriptions, making it clearer for developers to diagnose and resolve issues related to table migration. - Removed the linting false positive for missing table format warning when using
spark.table(#3589). In this release, linting false positives related to missing table format warnings when usingspark.tablehave been addressed, resolving issue #3545. The linting logic and unit tests have been updated to handle changes in the default format for table references in Databricks Runtime 8.0, which now uses Delta as the default format. These changes improve the accuracy of the linting process, reducing unnecessary warnings and enhancing the overall developer experience. Additionally, thetest_linting_walker_populates_pathsunit test in thetest_jobs.pyfile has been updated to use a different file path for testing. - Removed tree from
PythonSequentialLinter(#3535). In this release, thePythonSequentialLinterhas been refactored to no longer manipulate the code tree, and instead, the tree manipulation logic has been moved toNotebookLinter. This change improves the separation of concerns between the two components, resulting in a more modular and maintainable codebase. TheNotebookLinternow handles early failure when resolving the code used by a notebook and attaches%runnotebook trees as a child tree to the cell that calls the notebook. The code linting functionality has been modified, and thedatabricks labs ucx lint-local-codecommand has been updated. These changes resolve #3543 and progress #3514 and are dependent on PRs #3529 and #3550. The changes have been manually tested and include added and modified unit tests. Additionally, theAdviceclass has been updated to include a type variableT, which allows for more specific type hinting when creating instances of the class and its subclasses. - Rename file language helper function (#3661). In this code change, the helper function for determining the file language and checking its support by the linter has been renamed and refactored. The function, previously called
file_language, has been updated and now namedinfer_file_language_if_supported. This change clarifies the function's purpose as it not only infers the file language but also checks if the file is supported by the linter, acting as a filter. The function returns aLanguageobject if the file is supported orNoneif it is not. Theinfer_file_language_if_supportedfunction has been used in other parts of the codebase, such as theis_a_notebookfunction. This change improves the codebase's readability and maintainability by making the helper function's purpose more explicit. The related code has been updated to use the new function accordingly. - Scope crawled jobs in
JobsCrawlerwithinclude_job_ids(#3658). In this release, theJobsCrawlerclass in theworkflow_task.pyfile has been updated to include a new optional parameterinclude_job_idsin the constructor. This parameter allows users to specify a list of job IDs to include in the crawling process, improving efficiency in large workspaces. Additionally, a check has been added to the_assess_jobsmethod to skip jobs whose IDs are not in the list of included IDs. Integration tests have been added to ensure the correct behavior of the new feature. This change resolves issue #3656, which requested the ability to crawl jobs based on a specific list of job IDs. It is recommended to add a comment to the code explaining the purpose and usage of theinclude_job_idsparameter and update the documentation accordingly. - Support fixing
LocalFile's withFileLinter(#3660). In this release, we have added new methodswrite_text,safe_write_text,back_up_path, andrevert_back_up_pathto thebase.pyfile to support fixing files inLocalFilecontainers and adding unit tests and integration tests. TheLocalFileclass in the "files.py" file has been extended to include new methods and properties, such asapply,migrated_code,back_up_path, andback_up_original_and_flush_migrated_code, enabling fixing files using linters and writing changes back to the container. Thedatabricks labs ucx migrate-local-codecommand has also been updated to utilize the new functionality. These changes address issue #3514, ensuring the proper handling of errors during file writing and providing automated fixing of code issues within LocalFiles. - Updated
migate-local-codeto use latest linter functionality (#3700). In this update, themigrate-local-codecommand has been enhanced by incorporating the latest linter functionality. TheLocalFileMigratorandLocalCodeLinterclasses have been merged, and the interfaces of.fixand.applymethods have been aligned. A newFixerWalkerhas been introduced to address dependencies in the dependency graph, and the existingdatabricks labs ucx migrate-local-codecommand has been updated accordingly. Relevant unit tests and integration tests have been added and modified to ensure the correctness of the changes, which resolve issue #3514 and supersede issue #3520. Thelint-local-codecommand has also been updated with a flag to specify the path for linting. Themigate-local-codecommand now lints local code and generates advice on how to make it compatible with the Unity Catalog, and can also apply local code fixes to make them compatible. - Updated sqlglot requirement from <26.3,>=25.5.0 to >=25.5.0,<26.4 (#3572). In this pull request, we have updated the requirement for the
sqlglotlibrary in the 'pyproject.toml' file, changing it from being greater than or equal to version 25.5.0 and less than 26.3, to being greater than or equal to version 25.5.0 and less than 26.4. This change is part of issue #3572 and was made to allow for the use of the latest version of 'sqlglot'. The pull request includes a changelog from thesqlglotrepository, detailing the changes made in each version between 25.5.0 and 26.4. The commits relevant to this update include bumping the version ofsqlglotrsto various versions between 0.3.7 and 0.3.14. This pull request was automatically generated by Dependabot, a tool that creates pull requests to update the dependencies in a project. It is now ready for review and merging. - Updated sqlglot requirement from <26.4,>=25.5.0 to >=25.5.0,<26.7 (#3677). In this release, we have updated the
sqlglotdependency from version>=25.5.0,<26.4to>=25.5.0,<26.7. This change allows us to leverage the latest version ofsqlglot, which includes various bug fixes and improvements, such as avoiding redundant casts in FROM/TO_UTC_TIMESTAMP and enhancing UUID support. Although there are some breaking changes introduced in the latest version, they should not affect our project's functionality. Additionally, this update includes several bug fixes and improvements for specific dialects such as Redshift, BigQuery, and TSQL. Overall, this update enhances the performance and functionality of thesqlglotlibrary, ensuring compatibility with the latest version. - Use cached property for table migration index on local checkout context (#3711). In this release, we introduce a new cached property,
_migration_index, to theLocalCheckoutContextclass, designed to store the table migration index for the local checkout context. This change aims to prevent multiple recrawling when the migration index is empty. Thelinter_context_factorymethod has been refactored to utilize the new_migration_indexproperty, and theCurrentSessionStateparameter is removed. Additionally, thelocal_code_lintermethod has been updated to leverage the newLinterContextinstance with the_migration_indexproperty, instead of using thelinter_context_factorymethod. TheLocalCodeLinterobject now accepts a new callable lambda function, returning aLinterContextinstance with the_migration_indexproperty. These enhancements improve code performance by reducing the migration index crawls in the local checkout context and simplify the code by eliminating theCurrentSessionStateparameter. - [DOCS] Explain when to run
remove-workspace-local-backup-groupsworkflow (#3707). In this release, the UCX component of the application has been enhanced with new Databricks workflows for orchestrating the group migration process. Theworkflowscommand displays the status of the workflows, and therepair-runcommand allows for rerunning failed workflows. The group migration workflow is specifically designed to be executed after a successful assessment workflow, and running it is followed by an optionalremove-workspace-local-backup-groupsworkflow. This final step removes unnecessary workspace-level backup groups and their associated permissions, keeping the workspace clean and organized. Theremove-workspace-local-backup-groupsworkflow should only be executed after confirming the successful migration of all groups involved.
Dependency updates:
- Updated sqlglot requirement from <26.3,>=25.5.0 to >=25.5.0,<26.4 (#3572).
- Updated sqlglot requirement from <26.4,>=25.5.0 to >=25.5.0,<26.7 (#3677).
Contributors: @JCZuurmond, @pritishpai, @FastLee, @dependabot[bot], @mohanab-db