Releases · databrickslabs/ucx

03 Oct 22:03

gueniai

v0.60.1

5ecd310

v0.60.1 Latest

Latest

Added CI fix for hatch dependency (#4433). A temporary fix has been implemented to address a dependency issue with hatch, specifically related to a problem with the Click library, as tracked in the Click project. To ensure compatibility and prevent errors in the testing process, which relies on hatch, the installation of hatch now includes a version constraint for the Click library, ensuring that a version less than 8.3.0 is installed. This adjustment provides a workaround for a known issue, allowing the testing framework to proceed without disruptions, and ensuring that the project's testing dependency on hatch remains functional until a more permanent solution is available.
Enhanced Assessment Export (#4432). The export functionality has been enhanced to handle large assessment exports by implementing a zip compression strategy, preventing export failures due to file size limitations. This is achieved by creating a zip archive containing the Excel file, which is generated using a ZipFile object and an ExcelWriter object writing to a BytesIO buffer. Error handling has been added to handle missing queries, allowing the export process to continue without interruption. Various functions, including those responsible for downloading and exporting assessment results, have been updated to work with the new zip file format, including changes to file names and download links. Additionally, test cases have been modified to verify the successful export of assessment results as zip files, ensuring that the new compression strategy is functioning correctly and transparently handling large exports.
Split assess_workflows task into its new Workflow (#4395). The workflow assessment functionality has been decoupled from the main Assessment workflow and introduced as a standalone task, providing users with greater control over when and how workflow assessment is executed. This task, which retrieves jobs to analyze their notebooks and files for migration problems, can be triggered manually using the new run-assess-workflows command, allowing users to optionally run the assessment for a collection of workspaces. The Assessment workflow has been modified to remove the workflow assessment task, reducing its complexity, and a new Workflow assessment section has been added to the documentation to describe the new task's functionality and usage. The introduction of this standalone task enables users to assess workflows that ran within the last 30 days, with the option to analyze all workflows, and provides more flexibility and control over the assessment process, which was previously bundled within the Assessment workflow and potentially prolonged the overall run time.

Contributors: @pritishpai, @andresgarciaf

Contributors

andresgarciaf and pritishpai

Assets 4

29 Aug 18:18

gueniai

v0.60.0

54c6636

v0.60.0

Added handling for exception in create-missing-principals because there are no UC roles (#4373). The create-missing-principals functionality has been enhanced to handle exceptions more robustly, particularly when no University Controller (UC) roles are present. This update introduces more informative error handling, allowing the code to exit gracefully with a descriptive message instead of failing when encountering missing UC roles. The changes involve catching NotFound exceptions when loading UC-compatible roles, logging warning messages, and ensuring the code continues to run smoothly even when the UC roles file is not found. Additionally, test fixtures have been modified to support scenarios with no UC roles, enabling more comprehensive testing of the enhanced exception handling capabilities. Overall, these updates improve the reliability and usability of the create-missing-principals functionality in scenarios where UC roles are absent.
Adds option to skip assess_workflows via install-time choice & config (#4380). The workflow assessment process has been made optional, allowing users to control whether workflow analysis is performed during installation. A new configuration flag, skip_assess_workflows, can be set to skip the assess_workflows task, which analyzes jobs to identify migration problems. If this flag is enabled, the assessment workflow will be bypassed, logging an informational message and skipping the workflow linter report refresh and subsequent filesystem access. This change provides users with more flexibility in configuring their workflows, potentially streamlining their workflow or improving performance in certain scenarios. Users can opt out of workflow assessment during installation through a prompt or by setting the skip_assess_workflows flag in their configuration, allowing for more control over the installation process.
Adds retries and uses listing to fetch groups (#4370). The group fetching functionality has been enhanced to improve performance and handle the eventually consistent nature of group deletion. A new get_group function has been introduced, which attempts to fetch a group from a GroupManager instance using a retried approach. This function takes a GroupManager instance and a group name as input, checks if the group still exists, and raises an exception accordingly. The retry logic is implemented using the retried decorator, which retries the function on AssertionError exceptions for up to one minute. This change aims to resolve issues related to group deletion and renaming by leveraging retries and listing to achieve faster eventual consistency, and is accompanied by updated integration tests to ensure the functionality works as expected.
Extends assert retries for table and schema grants (#4369). The retry mechanism for assertions related to grants on tables, schemas, and User-Defined Functions (UDFs) has been enhanced with the introduction of a new function, which extends the retry capability for grant-related assertions. This function allows for the retrieval of actual grants for a specified object, based on its type, and asserts that they match the expected grants, with the added capability of retrying the assertion for up to 30 seconds if it fails due to an assertion error. The function supports various parameters, including object type, migration group, and expected grants, among others, providing a flexible and robust way to verify grants. Additionally, existing test functions have been updated to utilize this new function, replacing previous retry logic for UDF grants and introducing retry logic for table and schema grants, thereby improving the overall reliability and accuracy of grant-related assertions.
Fixed create federated catalog to add all external locations to the allowed paths (#4387). The federated catalog creation functionality has been enhanced to include all external locations in the allowed paths, ensuring a more comprehensive catalog creation process. This is achieved by modifying the internal logic to scan all external locations, extract their URLs and names, and append these URLs to the list of authorized paths. The process also includes checks for locations with missing URLs or names, logging warnings and skipping such locations, as well as verifying the existence and username of the current user to add missing permissions as needed. This change aims to improve the accuracy and completeness of the federated catalog by properly incorporating all external locations, thus enhancing the overall catalog creation experience.
Fixed migrate-locations issues (#4379). The migrate-locations functionality has been enhanced to improve its robustness and error handling capabilities. The generated external location names are now sanitized by replacing dots with underscores, which helps prevent potential errors. Furthermore, the code has been modified to catch and handle BadRequest exceptions that may occur when creating an external location, allowing the migration process to continue uninterrupted even if some external locations already exist. This is achieved by logging the exception as a warning and skipping the problematic location, ensuring a more reliable migration experience.
Fixed service credential lookup for create-federated-catalog for GLUE external HMS (#4382). The list_glue method has been enhanced to correctly retrieve and parse credentials for Glue access, with improvements including the addition of a CredentialPurpose enum and modified logic for handling potential errors and empty responses. The method now catches NotFound exceptions, logs informative error messages, and directly accesses the credential_response dictionary to filter credentials based on the CredentialPurpose.SERVICE purpose. This ensures that only service credentials are included in the resulting dictionary. Furthermore, the method logs the number of distinct IAM roles used in UC service credentials, providing more insight into the retrieved credentials. The updated method also includes robust checks for the existence and type of the credential_response and its contents, preventing potential errors and ensuring consistent results. Additionally, the code now supports the lookup of service credentials for creating a federated catalog, specifically for GLUE external HMS, and returns a list of CredentialInfo objects representing storage and service credentials, which can be filtered by the list_glue method to return only those with a purpose of CredentialPurpose.SERVICE.
Improve STS Export to Assessment to Excel (#4375). The export assessment functionality has been enhanced to support exporting UCX assessment results in Excel format, in addition to the existing CSV format. The --export-format excel argument can be used to generate an Excel file, and users will be prompted to provide the destination path. The AssessmentExporter class has been refactored to improve compatibility, remove compute constraints, and encapsulate export logic, allowing for easier export of assessment results in various formats. New methods have been added to support exporting results to CSV, Excel, and web formats, and the code includes unit tests and integration tests to verify the functionality of these new methods. The export functionality can now be executed directly from the RuntimeContext, without relying on a personal compute environment, and the exported file can be generated in either CSV or Excel format, depending on the specified export format parameter.
Uses the new property id from RunAs in pytester (#4363). The library's code has been refactored to leverage the newly introduced id property from RunAs, replacing the previous protected access to the service principal's id attribute with a more straightforward and secure approach. This simplification enables direct access to the id value via make_run_as().id, improving the overall organization and maintainability of the code. By adopting this change, the need for protected access to the service principal's id attribute has been eliminated, resulting in more robust and easier-to-maintain code.

Contributors: @pritishpai, @FastLee, @andresgarciaf

Contributors

FastLee, andresgarciaf, and pritishpai

Assets 4

07 Aug 20:17

gueniai

v0.59.2

821de4b

v0.59.2

Added force_refresh parameter for Assessment (#4183). The assessment workflow has been enhanced with the introduction of a force-refresh parameter, allowing users to forcefully rerun the assessment and obtain new results, even if it was previously run. This parameter, which defaults to false, provides users with more control over the assessment workflow, enabling them to manually trigger a refresh of the assessment data when needed. The force-refresh parameter can be set to true to overwrite existing results, and its usage is optional, allowing users to choose when to refresh the assessment data. The addition of this parameter modifies the behavior of the assessment workflow, which previously would not update the output of previously run assessments. With this update, users can now force a complete refresh of the assessment data, which can be useful in scenarios where the assessment results need to be recalculated. However, users are advised to use this option with caution, as the process can be time-consuming and resource-intensive for large workspaces.
Added SECURITY.md for vulnerability reporting policy (#4357). A security policy has been introduced to govern the security of UCX, outlining procedures for reporting and addressing vulnerabilities. According to this policy, security updates will only be applied to the latest version of UCX, with notable updates highlighted in release notes, and will not be backported to earlier versions. To report a vulnerability, users should email the details to a designated address or contact their Databricks representative, and can expect an acknowledgement of receipt within 48 hours, after which the reported vulnerabilities will be reviewed and addressed promptly. Users are also encouraged to follow security best practices, including using the latest released version of UCX and reviewing recommended configurations and operational security considerations in the UCX documentation, to ensure the secure use of UCX.
Added handling for PermisssionDenied exception so that a new dashboard is created (#4209). The dashboard management functionality has been enhanced to improve its resilience and robustness, particularly in handling exceptions and invalid dashboard states. When a PermissionDenied exception occurs while accessing a dashboard, the system now creates a new dashboard instead of attempting to access the existing one. Additionally, the code can recreate a dashboard if it is trashed or its reference is corrupted. To support these enhancements, new methods have been introduced to check if a dashboard is a Redash dashboard, upgrade a Redash dashboard to Lakeview, check if a dashboard is trashed, and recover an invalid dashboard by deleting any dangling files. These changes have also improved the readability and maintainability of the existing functionality, making the dashboard management more reliable and robust.
Convert WASBS to ABFSS experimental workflow (#4031). An experimental workflow has been introduced to convert Azure Blob Storage (WASBS) URLs to Azure Data Lake Storage Gen2 (ABFSS) URLs, enabling users to modernize their storage paths while preserving table metadata. This feature transforms URLs by changing the protocol and hostname suffix, identifying tables that use WASBS, and updating them to the more performant ABFSS format. The implementation includes new methods, such as wasbs_to_abfss and convert_wasbs_to_adls_gen2, which utilize the urlparse function and Spark session to crawl tables, filter those that use WASBS, and update their locations to ABFSS. Additionally, existing methods have been updated to include return types and docstrings for better clarity and maintainability, and new test configurations have been added to validate the conversion process, ensuring a seamless migration path for users wanting to take advantage of Azure Data Lake Storage Gen2's improved features.
Fixed service principal not re-created while creating account groups (#4360). The account group creation functionality has been enhanced to support service principals as members, allowing for more comprehensive and flexible group management. The code now checks for member references starting with Users or ServicePrincipals and includes them in the list of members to add, effectively enabling service principals to be part of account groups. Additionally, the warning message for skipped members has been updated to accurately reflect the expected types of members, which can be users, service principals, or groups. This modification resolves the issue of service principals not being created during account group creation and ensures that account groups can include service principals as members. Furthermore, a new reusable function has been introduced to retrieve a group by its display name, retrying for up to two minutes if the group is not found, and the test suite has been updated to verify the correct creation of groups with both users and service principals as members.

Contributors: @pritishpai, @asnare, @FastLee

Contributors

asnare, FastLee, and pritishpai

Assets 4

03 Jul 20:18

gueniai

v0.59.1

66dafd8

v0.59.1

Added documentation for #3963 (#4020). The workflow assessment functionality has been enhanced with an experimental task that analyzes recently executed workflows for migration problems, providing links to relevant documentation and recommendations for addressing identified issues. This task, labeled as experimental, now only runs for workflows executed within the last 30 days, but users can opt to analyze all workflows by running a specific workflow. The assessment findings, including any migration problems detected, can be viewed in the assessment dashboard, offering a centralized location for monitoring and addressing potential issues, and helping users to ensure a smoother migration process.
Enhance documentation for UCX (#4024). The UCX documentation has undergone significant enhancements to improve user experience and provide comprehensive guidance for contributors and users. The main page has been revamped with a simplified and accelerated Unity Catalog migration process, featuring a prominent call-to-action and key features such as comprehensive assessment, automated migrations, and detailed reporting. Additional pages, including a Getting Started section, have been added to guide users through installation, running, and operating the toolkit, with links to relevant sections such as installation, running, and reference materials. The contributing section has been updated for consistency, and a new How to Contribute section has been introduced, providing clear resources for submitting issues, pull requests, and contributing to the documentation. The documentation structure has been reorganized, with updated sidebar positions and revised descriptions to better reflect the content and purpose of each section, ultimately aiming to provide better user documentation, clarity, and a more intuitive interface for navigating and utilizing the UCX toolkit.
Fixes fmt and unit test failures from new blueprint release (#4048). The dependency on the databricks-labs-blueprint has been updated to a version range of 0.11.0 or higher but less than 0.12.0, incorporating new features and bug fixes from the latest blueprint release. To ensure compatibility with this updated version, the codebase has been updated to address breaking changes introduced in the recent blueprint release, including the addition of type hints to MockInstallation and the DEFAULT_CONFIG variable, which is now defined as a dictionary with string keys and RootJsonValue values. Furthermore, a previously failing unit test has been fixed, and the test_installation_recovers_invalid_dashboard function has been refactored into two separate test functions to verify the recovery of invalid dashboards due to InvalidParameterValue and NotFound exceptions, utilizing the MockInstallation class and caplog fixture to capture and verify log messages. These changes aim to resolve issues with the new blueprint release, enable previously failing acceptance tests, and improve the overall robustness of the installation process.
Updated for Databricks SDK 0.56+ (#4178). The project's dependencies have been updated to support Databricks SDK version 0.56 and above, with the upper bound set to less than 0.58.0, to ensure compatibility with the evolving SDK. This update includes breaking changes, and as a result, various modifications have been made to the code, such as adding type hints to functions to improve linting, replacing PermissionsList with GetPermissionsResponse, and accessing SecurableType enum values using the value attribute. Additionally, several test functions have been updated to reflect these changes, including the addition of return type hints and the use of create_autospec to create mock objects. These updates aim to maintain the project's functionality and ensure seamless compatibility with the latest Databricks SDK version, while also improving code quality and test coverage. The changes affect various aspects of the code, including grants management, permissions retrieval, and test cases for different scenarios, such as migrating managed tables, external tables, and tables in mounts.
Workaround for acceptance with dependabot PR (#4029). The library's dependency on a key SDK has been updated to support a broader version range, now compatible with versions from 0.44.0 up to but not including 0.54.0, enhancing flexibility and potentially allowing for the incorporation of new features or bug fixes introduced in these versions. Additionally, an internal function responsible for setting default catalog settings has been refined to handle NotFound exceptions more robustly. Specifically, the function now checks for the presence of metadata before attempting to retrieve an etag, preventing potential errors that could occur when metadata is missing, thereby improving the overall stability and reliability of the library.

Contributors: @asnare, @pritishpai

Contributors

asnare and pritishpai

Assets 4

06 May 04:45

gueniai

v0.59.0

355b45a

v0.59.0

Adds requirement for matching account groups to be created before assessment to the docs (#4017). The account group setup requirements have been clarified to ensure successful assessment and group migration workflows, mandating that account groups matching workspace local groups are created beforehand, which can be achieved manually or programmatically via various methods. The assessment workflow has been enhanced to retrieve workspace assets and securable objects from the Hive metastore for compatibility assessment with UC, storing the results in the inventory database for further analysis. Additionally, the documentation now stresses the necessity of running the validate-groups-membership command prior to initiating the group migration workflow, and recommends running the create-account-groups command beforehand if the required account groups do not already exist, to guarantee a seamless execution of the assessment and migration processes.
Fixed Service Principal instructions for installation (#3967). The installation requirements for UCX have been updated to reflect changes in Service Principal support, where it is no longer supported for workspace installations, but may be supported for account-level installations. As a result, account-level identity setup now requires connection via Service Principal with Account Admin and Workspace Admin privileges in all workspaces. All other installation requirements remain unchanged, including the need for a Databricks Premium or Enterprise workspace, network access to the Databricks Workspace and the Internet, a created Unity Catalog Metastore, and a PRO or Serverless SQL Warehouse for rendering reports. Additionally, users with external Hive Metastores, such as AWS Glue, must consult the relevant guide for specific instructions to ensure proper setup.
Fixed migrate tables when default catalog is set (#4012). The handling of the default catalog in the Hive metastore has been enhanced to ensure correct behavior when the default catalog is set. Specifically, the DESCRIBE SCHEMA EXTENDED and SHOW TBLPROPERTIES queries have been updated to include the hive_metastore prefix when fetching database descriptions and constructing table identifiers, respectively, unless the table is located in a mount point, in which case the delta prefix is used. This change addresses a previously reported issue with migrating tables when the default catalog is set, ensuring that table properties are correctly fetched and tables are properly identified. The update has been applied to multiple test cases, including those for skipping tables, upgraded tables, and mapping tables, to guarantee correct execution of queries with the default catalog name, which is essential when the default catalog is set to hive_metastore.
Limit crawl workflows task in assessment to workflows that ran in the last 30 days (#3963). The JobInfo class has been enhanced with a new last_run attribute to store the timestamp of the last pipeline execution, allowing for better monitoring and assessment. The from_job method has been updated to initialize this attribute consistently. Additionally, the assess_workflows method now filters workflows to only include those that have run within the last 30 days, achieved through the introduction of a last_run_days parameter in the refresh_report method. This parameter enables time-based filtering of job runs, and a new inner function lint_job_limited handles the filtering logic. The lint_job method has also been updated to accept the last_run_days parameter and check if a job has run within the specified time frame. Furthermore, a new test method test_workflow_linter_refresh_report_time_bound has been added to verify the correct functioning of the WorkflowLinter class when limited to recent workflow runs, ensuring that it produces the expected results and writes to the correct tables.
Pause migration progress workflow schedule (#3995). The migration progress workflow schedule is now paused by default, with its pause_status set to PAUSED, to prevent automatic execution and potential failures due to missing prerequisites. This change is driven by the experimental nature of the workflow, which may fail if a UCX catalog has not been created by the customer. To ensure successful execution, users are advised to unpause the workflow after running the create-ucx-catalog command, allowing them to control when the workflow runs and verify that necessary prerequisites are in place.
Warns instead of an error while finding an acc group in workspace (#4016). The behavior of the account group reflection functionality has been updated to handle duplicate groups more robustly. When encountering a group that already exists in the workspace, the function now logs a warning instead of an error, allowing it to continue executing uninterrupted. This change accommodates the introduction of nested account groups from workspace local groups, which can lead to groups being present in the workspace that are also being migrated. The warning message clearly indicates that the group is being skipped due to its existing presence in the workspace, providing transparency into the reflection process.

Contributors: @pritishpai, @FastLee

Contributors

FastLee and pritishpai

Assets 4

16 Apr 19:10

gueniai

v0.58.0

bb89679

v0.58.0

Added ability to create account groups from nested ws-local groups (#3818). The create_account_level_groups method has been added, enabling the creation of account level groups from workspace groups. This method retrieves valid workspace groups and recursively creates account level groups for each group, handling nested groups by checking if they already exist and creating them if necessary. The AccountGroupCreationContext dataclass is used to keep track of created, preexisting, and renamed groups. A new test function, test_create_account_level_groups_nested_groups, has been added to the test_account.py file to test the creation of account level groups from nested workspace-local groups. This function checks if the account level groups are created correctly, with the same members and membership as the corresponding workspace-local groups. The ComplexValue class has been modified to include the ref field, which references user objects, enabling the creation of account groups with members identified by their workspace-local user IDs. Integration tests have been added to verify the functionality of these changes.
Added error handling and tests for Workflow linter during pipeline fetch (#3819). The recent change to the open-source library introduces error handling and tests for the Workflow linter during pipeline fetch. The _register_pipeline_task method in the "jobs.py" file has been updated to handle cases where the pipeline does not exist, by yielding a DependencyProblem instance with an appropriate error message. A new private method, "_register_pipeline_library", has been introduced to handle the registration of libraries present in the pipeline. Additionally, new unit tests and integration tests have been added to ensure that the Workflow linter properly handles cases where pipelines do not exist, and manual testing has been conducted to verify the feature. Overall, these changes improve the robustness and reliability of the Workflow linter by adding error handling and testing for edge cases during pipeline fetch.
Added hyperlinks to tables and order the rows by type, name (#3951). In this release, the Table Types widget has been updated to enhance the user experience. The table names in the widget are now clickable and serve as hyperlinks that redirect users to a specified URL with the table name as the link text and title. The rows in the widget are also reorganized by type and then by name, making it easier for users to locate the required table. Additionally, a new set of encodings has been added for the widget that specifies how fields should be displayed, including a link display type for the name field to indicate that it should be displayed as a hyperlink. These changes were implemented in response to issue #3259. A manually tested flag has been included in the commit, indicating that the changes have been tested, but unit and integration tests have not been added. A screenshot of the changes is also included in the commit.
Added links to compute summary widget (#3952). In this release, we have added links to the compute summary widget to enhance navigation and usability. The encodings spec in the spec object now includes overrides for a SQL file, which adds links to the cluster_id and cluster_name fields, opening them in a new tab with the respective cluster's details. Additionally, the finding and creator fields are now displayed as strings. These changes improve the user experience by providing direct access to cluster details from the compute summary widget. The associated issue #3260 has been resolved. Manual testing has confirmed that the changes work as expected.
Adds option to install UCX in offline mode (#3959). A new capability has been introduced to install the UCX library in offline mode, enabling software engineers to install UCX in environments with restricted Internet access. This offline installation process can be accomplished by installing UCX on a host with Internet access, zipping the installation, transferring the zip to the target host, and unzipping it. To ensure a successful installation, the Databricks CLI version must be v0.244.0 or higher. Additionally, this commit includes updated documentation detailing the offline installation process. This feature addresses issue #3418, making it easier for software engineers to install UCX in offline environments.
Fixed Assessment Excel Exporter (#3962). The open-source library has been updated with several new features to enhance its functionality. Firstly, we have implemented a new sorting algorithm that offers improved performance and flexibility for sorting large datasets. This algorithm includes customizable options for handling ties and can be easily integrated into existing codebases. Additionally, we have added support for asynchronous processing, allowing developers to execute time-consuming tasks in the background while maintaining application responsiveness. This feature includes a new API for managing asynchronous tasks and improved error handling for better reliability. Lastly, we have introduced a new configuration system that simplifies the process of setting up and customizing the library. This system includes a default configuration that covers most use cases and allows for easy overriding of specific settings. These new features are designed to provide developers with more powerful and flexible tools for working with the open-source library.
Fixed Assessment Exporter Notebook (#3829). In this commit, the Assessment Exporter Notebook has been updated to improve code maintainability and robustness. The main change is the adjustment of the Lakeview dashboard Assessment Main dashboard path to the new naming format, which is now determined dynamically to avoid hardcoded values. The path format has also been changed from string to Path object format. Additionally, a new method _process_id_columns has been added to process ID columns in the dataset, checking for any column with id in the name and wrapping them in quotes. These changes have been manually tested and improve the accuracy of the exported Excel file and the maintainability of the code, ensuring that the Assessment Main dashboard path is correct and up-to-date and the data is accurately represented in the exported file.
TECH DEBT Use right workspace api call for listing credentials (#3957). In this release, we have implemented a change in the list method of the credentials.py file located in the databricks/labs/ucx/aws directory, addressing issue #3571. The list method now utilizes the list_credentials method from the _ws.credentials object instead of the api_client for listing AWS credentials. This modification replaces the previous TODO comment with actual code, thereby improving code quality and reducing technical debt. The list_credentials method is a part of the Databricks workspace API, offering a more accurate and efficient approach to list AWS credentials, resulting in enhanced reliability and performance for the code responsible for managing AWS credentials.
[TECHDEBT] Remove unused code for _resolve_dbfs_root in MountCrawler (#3958). In this release, we have made improvements to the MountCrawler class by removing the unused code for the _resolve_dbfs_root method and its dependencies. This method was previously used to resolve the root location of a DBFS, but it has been deprecated in favor of a new API call. The removal of this unnecessary functionality simplifies the codebase and aligns it with our goal of creating a more streamlined and efficient system. Additionally, this release includes a fix for issue #3452. Rest assured that these changes will not affect the current functionality or behavior of the system and are intended to enhance the overall performance and maintainability of the codebase.
[Tech Debt] removing notfound if not required in test_install.py (#3826). In this release, we've made improvements to our test suite by removing the redundant notfound function in test_install.py, specifically from 'test_create_database', 'test_open_config', and 'test_save_config_ext_hms'. The notfound function previously raised a NotFound error, which has now been replaced with a more specific error message or behavior. This enhancement simplifies the codebase, reduces technical debt, and addresses issue #2700. Note that no new unit tests were added, but existing tests were updated to account for the removal of 'notfound'.
[Tech Debt] standardising the error message for required parameter in cli command (#3827). This release introduces changes to standardize error messages for required parameters in the databricks labs ucx CLI command, addressing tech debt and improving the user experience. Instead of raising a KeyError, the command now returns clear and consistent error messages when required parameters are missing. Specifically, the repair_run function handles the case when the --step parameter is not provided, and the move and alias functions handle missing --from_catalog, `...

Contributors

FastLee, andresgarciaf, and 2 other contributors

Assets 4

05 Mar 03:47

gueniai

v0.57.0

d0bcfc5

v0.57.0

Convert UCX job ids to int before passing to JobsCrawler (#3816). In this release, we have addressed issue #3722 and improved the robustness of the open-source library by modifying the jobs_crawler method to handle job IDs more effectively. Previously, job IDs were passed directly to the exclude_job_ids parameter, which could cause issues if they were not integers. To address this problem, we have updated the jobs_crawler method to convert all job IDs to integers using a list comprehension before passing them to the method. This change ensures that only valid integer job IDs are used, thereby enhancing the reliability of the method. The commit includes a manual test to confirm the correct behavior of this modification. In summary, this modification improves the robustness of the code by ensuring that integer job IDs are utilized correctly in the JobsCrawler method.
Exclude UCX jobs from crawling (#3733). In this release, we have made modifications to the JobsCrawler and the existing assessment workflow to exclude UCX jobs from crawling, avoiding confusion for users when they appear in assessment reports. This change addresses issues #3656 and #3722, and is a follow-up to previous issue #3732. We have also incorporated updates from pull requests #3767 and #3759 to improve integration tests and linting. Additionally, a retry mechanism has been added to wait for grants to exist before crawling, addressing issue #3758. The changes include the addition of unit and integration tests to ensure the correctness of the modifications. A new exclude_job_ids parameter has been added to the JobsCrawler constructor, which is initialized with the list of UCX job IDs, ensuring that UCX jobs are not included in the assessment report. The _list_jobs method now excludes jobs based on the provided exclude_job_ids and include_job_ids arguments. The _crawl method now uses the _list_jobs method to list the jobs to be crawled. The _assess_jobs method has been updated to take into account the exclusion of specific job IDs. The test_grant_detail file, an integration test for the Hive Metastore grants functionality, has been updated to include a retry mechanism to wait for grants to exist before crawling and to check if the SELECT permission on ANY FILE is present in the grants.
Let WorkflowLinter.refresh_report lint jobs from JobsCrawler (#3732). In this release, the WorkflowLinter.refresh_report method has been updated to lint jobs from the JobsCrawler class, ensuring that only jobs within the scope of the crawler are processed. This change resolves issue #3662 and progresses issue #3722. The workflow linting code, the assessment workflow, and the JobsCrawler class have been modified. The JobsCrawler class now includes a snapshot method, which is used in the WorkflowLinter.refresh_report method to retrieve necessary data about jobs. Unit and integration tests have been updated correspondingly, with the integration test for workflows now verifying that all rows returned from a query to the workflow_problems table have a valid path field. The WorkflowLinter constructor now includes an instance of JobsCrawler, allowing for more targeted linting of jobs. The introduction of the JobsCrawler class enables more efficient and precise linting of jobs, improving the overall accuracy of workflow assessment.
Let dashboard name adhere to naming convention (#3789). In this release, the naming convention for dashboard names in the ucx library has been enforced, restricting them to alphanumeric characters, hyphens, and underscores. This change replaces any non-conforming characters in existing dashboard names with hyphens or underscores, addressing several issues (#3761 through #3788). A temporary fix has been added to the _create_dashboard method to ensure newly created dashboard names adhere to the new naming convention, indicated by a TODO comment. This release also resolves a test failure in a specific GitHub Actions run and addresses a total of 29 issues. The specifics of the modification made to the databricks labs install ucx command and the changes to existing functionality are not detailed, making it difficult to assess their scope. The commit includes the deletion of a file called 02_0_owner.filter.yml, and all changes have been manually tested. For future reference, it would be helpful to include more information about the changes made, their impact, and the reason for deleting the specified file.
Partial revert Let dashboard name adhere to naming convention (#3794). In this release, we have partially reverted a previous change to the migration progress dashboard, reintroducing the owner filter. This change was made in response to feedback from users who found the previous modification to the dashboard less intuitive. The new owner filter has been defined in a new file, '02_0_owner.filter.yml', which includes the title, column name, type, and width of the filter. To ensure proper functionality, this change requires the release of lsql after merging. The change has been thoroughly tested to guarantee its correct operation and to provide the best possible user experience.
Partial revert Let dashboard name adhere to naming convention (#3795). In this release, we have partially reversed a previous change that enforced a naming convention for dashboard names, allowing the use of special characters such as spaces and brackets again. The _create_dashboard method in the install.py file and the _name method in the mixins.py file have been updated to reflect this change, affecting the migration progress dashboard. The display_name attribute of the metadata object has been updated to use the original format, which may include special characters. The reference variable has also been updated accordingly. The functions created_job_tasks and created_job have been updated to use the new naming convention when retrieving installation jobs with specific names. These changes have been manually tested and the tests have been verified to work correctly after the reversion. This change is related to issues #3799, #3789, and reverts commit 048bc8f.
Put back dashboard names (#3808). In the lsql release v0.16.0, the naming convention for dashboards has been updated to support non-alphanumeric characters in the dashboard names. This change modifies the _create_dashboard function in install.py and the _name method in mixins.py to create dashboard names with a format like [UCX] assessment (Main), which includes parent and child folder names. This update addresses issues reported in tickets #3797 and #3790, and partially reverses previous changes made in commits 4017a25 and 834ef14. The functionality of other methods remains unchanged. With this release, the created_job_tasks and created_job functions now accept dashboard names with non-alphanumeric characters as input.
Updated databricks-labs-lsql requirement from <0.15,>=0.14.0 to >=0.14.0,<0.17 (#3801). In this update, we have updated the required version of the dat ab ricks-l abs-ls ql package from a version greater than or equal to 0.15.0 and less than 0.16.0 to a version greater than or equal to 0.16.0 and less than 0.17.0. This change allows for the use of the latest version of the package, which includes various bug fixes and dependency updates. The package is utilized in the acceptance tests that are run as part of the CI/CD pipeline. With this update, the acceptance tests can now be executed using the most recent version of the package, resulting in enhanced functionality and reliability.
Updated databricks-sdk requirement from <0.42,>=0.40 to >=0.44,<0.45 (#3686). In this release, we have updated the version requirement for the databricks-sdk package to be greater than or equal to 0.44.0 and less than 0.45.0. This update allows for the use of the latest version of the databricks-sdk, which includes new methods, fields, and bug fixes. For instance, the get_message_query_result_by_attachment method has been added for the w.genie.workspace_level_service, and several fields such as review_state, reviews, and runner_collaborators have been removed for the databricks.sdk.service.clean_rooms.CleanRoomAssetNotebook object. Additionally, the securable_kind field has been removed for various objects such as CatalogInfo and ConnectionInfo. We recommend thoroughly testing this update to ensure compatibility with your project. The release notes for versions 0.44.0 and 0.43.0 can be found in the commit history. Please note that there are several backward-incompatible changes listed in the changelog for bot...

Contributors

JCZuurmond and dependabot

Assets 4

25 Feb 03:53

gueniai

v0.56.0

05c2d6a

v0.56.0

Added documentation to use Delta Live Tables migration (#3587). In this documentation update, we introduce a new section for migrating Delta Live Table pipelines to the Unity Catalog as part of the migration process. This workflow allows for the original and cloned pipelines to run independently after the cloned pipeline reaches the RUNNING state. The update includes an example of stopping and renaming an existing HMS DLT pipeline, and creating a new cloned pipeline. Additionally, known issues and limitations are outlined, such as supported streaming sources, maintenance pausing, and querying by timestamp. To streamline the migration process, the migrate-dlt-pipelines command is introduced with optional parameters for including or excluding specific pipeline IDs. This feature is intended for developers and administrators managing data pipelines and handling table aliasing issues. Relevant user documentation has been added and the changes have been manually tested.
Added support for MSSQL and POSTGRESQL to HMS Federation (#3701). In this enhancement, the open-source library now supports Microsoft SQL Server (MSSQL) and PostgreSQL databases in the Hive Metastore Federation (HMS Federation) feature. This update introduces classes for handling external Hive Metastore instances and their versions, and refactors a regex pattern for better support of various JDBC URL formats. A new supported_databases_port class variable is added to map supported databases to default ports, allowing the code to handle SQL Server's distinct default port. Additionally, a supported_hms_versions class variable is created, outlining supported Hive Metastore versions. The _external_hms method is updated to extract HMS version information more accurately, and the _split_jdbc_url method is refactored for better URL format compatibility and parameter extraction. The test file test_federation.py has been updated with new unit tests for external catalog creation with MSSQL and PostgreSQL, further enhancing compatibility with various databases and expanding HMS Federation's capabilities.
Added the CLI command for migrating DLT pipelines (#3579). A new CLI command, "migrate-dlt-pipelines," has been added for migrating DLT pipelines from HMS to UC using the DLT Migration API. This command allows users to include or exclude specific pipeline IDs during migration using the --include-pipeline-ids and --exclude-pipeline-ids flags, respectively. The change impacts the PipelinesMigrator class, which has been updated to accept and use these new parameters. Currently, there is no information available about testing, but the changes are expected to be manually tested and accompanied by corresponding unit and integration tests in the future. The changes are isolated to the PipelinesMigrator class and related functionality, with no impact on existing methods or functionality.
Addressed Bug with Dashboard migration (#3663). In this release, the _crawl method in dashboards.py has been enhanced to exclude SDK dashboards that lack IDs during the dashboard migration process. This modification enhances migration efficiency by avoiding unnecessary processing of incomplete dashboards. Additionally, the _list_dashboards method now includes a check for dashboards with no IDs while iterating through the dashboards_iterator. If a dashboard with no ID is found, the method fetches the dashboard details using the _get_dashboard method and adds them to the dashboards list, ensuring proper processing. Furthermore, a bug fix for issue #3663 has been implemented in the RedashDashboardCrawler class in assessment/test_dashboards.py. The get method has been added as a side effect to the WorkspaceClient mock's dashboards attribute, enabling the retrieval of individual dashboard objects by their IDs. This modification ensures that the RedashDashboardCrawler can correctly retrieve and process dashboard objects from the WorkspaceClient mock, preventing errors due to missing dashboard objects.
Broaden safe read text caught exception scope (#3705). In this release, the safe_read_text function has been enhanced to handle a broader range of exceptions that may occur while reading a text file, including OSError and UnicodeError, making it more robust and safe. The function previously caught specific exceptions such as FileNotFoundError, UnicodeDecodeError, and PermissionError. Additionally, the codebase has been improved with updated unit tests, ensuring that the new functionality works correctly. The linting parts of the code have also been updated, enhancing the readability and maintainability of the project for other software engineers. A new method, safe_read_text, has been added to the source_code module, with several new test cases designed to ensure that the method handles edge cases correctly, such as when the file does not exist, when the path is a directory, or when an OSError occurs. These changes make the open-source library more reliable and robust for various use cases.
Case sensitive/insensitive table validation (#3580). In this release, the library has been updated to enable more flexible and customizable metadata comparison for tables. A case sensitive flag has been introduced for metadata comparison, which allows for consideration or ignoring of column name case during validation. The TableMetadataRetriever abstract base class now includes a new parameter column_name_transformer in the get_metadata method, which is a callable that can be used to transform column names as needed for comparison. Additionally, a new case_sensitive parameter has been added to the StandardSchemaComparator constructor to determine whether column names should be compared case sensitively or not. A new parametrized test function test_schema_comparison_case has also been included to ensure that this functionality works as expected. These changes provide users with more control over the metadata comparison process and improve the library's handling of cases where column names in the source and target tables may have different cases.
Catch AttributeError in InfferedValue._safe_infer_internal (#3684). In this release, we have implemented a change to the _safe_infer_internal method in the InferredValue class to catch AttributeError. This change addresses an issue in the Astroid library reported in their GitHub repository (pylint-dev/astroid#2683) and resolves issue #3659 in our project. By handling AttributeError during the inference process, we have made the code more robust and safer. When an exception occurs, an error message is logged with debug-level logging, and the method yields the Uninferable sentinel value to indicate that inference failed for the node. This enhancement strengthens the source code linting code through value inference in our open-source library.
Document to run validate-groups-membership before groups migration, not after (#3631). In this release, we have updated the order of executing the validate-groups-membership command in the group migration process. Previously, the command was recommended to be run after the groups migration, but it has been updated to be executed before the migration. This change ensures that the groups have the correct membership and the number of groups and users in the workspace and account are the same before migration, providing an extra level of safety. Additionally, we have updated the remove-workspace-local-backup-groups command to remove workspace-level backup groups and their permissions only after confirming the successful migration of all groups. We have also updated the spelling of the validate-group-membership command to validate-groups-membership in a documentation file. This release is aimed at software engineers who are adopting the project and looking to migrate their groups to the account level.
Extend code migration progress documentation (#3588). In this documentation update, we have added two new sections, Code Migration and "Final details," to the open-source library's migration process documentation. The Code Migration section provides a detailed walkthrough of the steps to migrate code after completing table migration and data reconciliation, including using the linter to investigate compatibility issues and linted workspace resources. The "linter advices" provide codes and messages on detected issues and resolution methods. The migrated code can then be prioritized and tracked using the migration-progress dashboard, and migrated using the migrate- commands. The Final details section outlines the steps to take once code migration is complete, including running the cluster-remap command to remap clusters to be Unity Catalog compatible. This update resolves issue #2231 and includes updated user documentation, with new methods for linting and migrating local code, managing dashboard migrations, and syncing workspace information. Additional commands for creating and validating table mappings, migrating locations, and assigning metastores are also included, with the aim of improving the code migration process by providing more detailed documentation and new commands for managing the migration.
Fixed Skip/Unskip sch...

Contributors

JCZuurmond, FastLee, and 3 other contributors

Assets 4

24 Jan 15:36

gueniai

v0.55.0

c3ad142

v0.55.0

Introducing UCX docs! (#3458). In this release, we introduced the new documents for UCX, you can find them here: https://databrickslabs.github.io/ucx/
Hosted Runner for release (#3532). In this release, we have made improvements to the release job's security and control by moving the release.yml file to a new location within a hosted runner group labeled "linux-ubuntu-latest." This change ensures that the release job now runs in a protected runner group, enhancing the overall security and reliability of the release process. The job's environment remains set to "release," and it retains the same authentication and artifact signing permissions as before the move, ensuring a seamless transition while improving the security and control of the release process.

Contributors: @sundarshankar89, @renardeinside

Contributors

renardeinside and sundarshankar89

Assets 4

23 Jan 22:16

gueniai

v0.54.0

ebe97e0

v0.54.0

Implement disposition field in SQL backend (#3477). This commit adds a query_statement_disposition configuration option for the SQL backend in the UCX tool, allowing users to specify the disposition of SQL statements during assessment results export and preventing failures when dealing with large workspaces and a large number of findings. The new configuration option is added to the config.yml file and used by the SqlBackend definition. The databricks labs install ucx and databricks labs ucx export-assessment commands have been modified to support this new functionality. A new Disposition enum has been added to the databricks.sdk.service.sql module. This change resolves issue #3447 and is related to pull request #3455. The functionality has been manually tested.
AWS role issue with external locations pointing to the root of a storage account (#3510). The AWSResources class in the aws.py file has been updated to enhance the regular expression pattern for matching S3 bucket names, now including an optional group for trailing slashes and any subsequent characters. This allows for recognition of external locations pointing to the root of a storage account, addressing issue #3505. The access.py file within the AWS module has also been updated, introducing a new path variable and updating a for loop condition to accurately identify missing paths in external locations referencing the root of a storage account. New unit tests have been added to tests/unit/aws/test_access.py, including a test_uc_roles_create_all_roles method that checks the creation of all possible UC roles when none exist and external locations with and without folders. Additionally, the backend fixture has been updated to include a new external location s3://BUCKET4, and various tests have been updated to incorporate this location and handle errors appropriately.
Added assert to make sure installation is finished before re-installation (#3546). In this release, we have added an assertion to ensure that the installation process is completed before attempting to reinstall, addressing a previous issue where the reinstallation was starting before the first installation was finished, causing a warning to not be raised and resulting in a test failure. We have introduced a new function wait_for_installation_to_finish(), which retries loading the installation if it is not found, with a timeout of 2 minutes. This function is utilized in the test_compare_remote_local_install_versions test to ensure that the installation is finished before proceeding. Furthermore, we have extracted the warning message to a variable error_message for better readability. This change enhances the reliability of the installation process.
Added dashboards to migration progress dashboard (#3314). This commit introduces significant updates to the migration progress dashboard, adding dashboards, linting resources, and modifying existing components. The changes include a new dashboard displaying the number of dashboards pending migration, with the data sourced from the ucx_catalog.multiworkspace.objects_snapshot table. The existing 'Migration [main]' dashboard has been updated, and unit and integration tests have been adapted accordingly. The commit also renames several SQL files, updates the percentage UDF, grant, job, cluster, table, and pipeline migration progress queries, and resolves linting compatibility issues related to Unity Catalog. The changes depend on issue #3424, progress issue #3045, and break up issue #3112. The new dashboard aims to enhance the migration process and ensure a smooth transition to the Unity Catalog.
Added history log encoder for dashboards (#3424). A new history log encoder for dashboards has been added, addressing issues #3368 and #3369, and modifying the existing experimental-migration-progress workflow. This update includes the addition of the DashboardOwnership class, used to generate ownership information for dashboards, and the DashboardProgressEncoder class, responsible for encoding progress data related to dashboards. The new functionality is tested through manual, unit, and integration testing. In the Table class, the from_table_info and from_historical_data methods have been added, allowing for the creation of Table instances from TableInfo objects and historical data dictionaries with more flexibility and safety. The test_tables.py file in the integration/progress directory has also been updated to include a new test function for checking table failures. These changes improve the tracking and management of dashboard IDs, enhance user name retrieval, and ensure the accurate determination of object ownership.
Create specific failure for Python syntax error while parsing with Astroid (#3498). This commit enhances the Python linting functionality in our open-source library by introducing a specific failure message, python-parse-error, for syntax errors encountered during code parsing using Astroid. Previously, a generic system-error message was used, which has been renamed to maintain consistency with the existing sql-parse-error message. This change provides clearer failure indicators and includes more detailed information about the error location. Additionally, modifications to Python linting-related code, unit test additions, and updates to the README guide users on handling these new error types have been implemented. A new method, Tree.maybe_parse(), has been introduced to parse Python code and detect syntax errors, ensuring more precise error handling for users.
DBR 16 and later support (#3481). This pull request introduces support for Databricks Runtime (DBR) 16 and later in the code that converts Hive Metastore (HMS) tables to external tables within the migrate-tables workflow. The changes include the addition of a new static method _get_entity_storage_locations to handle the new entityStorageLocations property in DBR16 and the modification of the _convert_hms_table_to_external method to account for this property. Additionally, the run_workflow function in the assessment workflow now has the skip_job_wait parameter set to True, which allows the workflow to continue running even if a job within it fails. The changes have been manually tested for DBR16, verified in a staging environment, and existing integration tests have been run for DBR 15. The diff also includes updates to the test_table_migration_convert_manged_to_external method to skip job waiting during testing, enabling the test to run successfully on DBR 16.
Delete stale code: NotebookLinter._load_source_from_run_cell (#3529). In this update, we have removed the stale code NotebookLinter._load_source_from_run_cell, which was responsible for loading the source code from a run cell in a notebook. This change is a part of the ongoing effort to address issue #3514 and enhances the overall codebase. Additionally, we have modified the existing databricks labs ucx lint-local-code command to update the code linting functionality. We have conducted manual testing to ensure that the changes function as intended and have added and modified several unit tests. The _load_source_from_run_cell method is no longer needed, as it was part of a deprecated functionality. The modifications to the databricks labs ucx lint-local-code command impact the way code linting is performed, ultimately improving the efficiency and maintainability of the codebase.
Exclude ucx dashboards from Lakeview dashboard crawler (#3450). In this release, we have enhanced the lakeview_crawler method in the open-source library to exclude Ucx dashboards and prevent false positives. This has been achieved by adding a new optional argument, exclude_dashboard_ids, to the init method, which takes a list of dashboard IDs to exclude from the crawler. The _crawl method has been updated to skip dashboards whose IDs match the ones in the exclude_dashboard_ids list. The change includes unit tests and manual testing to ensure proper functionality and has been verified on the staging environment. These updates improve the accuracy and reliability of the dashboard crawler, providing better results for software engineers utilizing this library.
Fixed issue in installing UCX on UC enabled workspace (#3501). This PR introduces changes to the ClusterPolicyInstaller class, updating the spark_version policy definition from a fixed value to an allowlist with a default value. This resolves an issue where, when UC is enabled on a workspace, the cluster definition takes on single_user and user_isolation values instead of Legacy_Single_User and 'Legacy_Table_ACL'. The job definition is also updated to use the default value when not explicitly provided. These changes improve compatibility with UC-enabled workspaces, ensuring the correct values for spark_version in the cluster definition. The PR includes updates to unit tests and installation tests, addressing issue #3420.
Fixed typo in workflow name (in error message) (#3491). This PR (Pull Request) addresses a minor typo in the error message displayed by the validate_groups_permissions method in the workflows.py file. The typo occurred in the workflow name mentioned in the error message, where group was incorrectly spelled as "groups." The corrected spelling is now validate-groups-permissions. This change does not introduce any new methods or modify any existing functionality, but instead focuses on enhancing the...

Contributors

asnare, JCZuurmond, and 5 other contributors

Assets 2

Releases: databrickslabs/ucx

v0.60.1

Contributors

Uh oh!

v0.60.0

Contributors

Uh oh!

v0.59.2

Contributors

Uh oh!

v0.59.1

Contributors

Uh oh!

v0.59.0

Contributors

Uh oh!

v0.58.0

Contributors

Uh oh!

v0.57.0

Contributors

Uh oh!

v0.56.0

Contributors

Uh oh!

v0.55.0

Contributors

Uh oh!

v0.54.0

Contributors

Uh oh!