Releases: databrickslabs/blueprint
Releases · databrickslabs/blueprint
v0.6.0
- Added upstream wheel uploads for Databricks Workspaces without Public Internet access (#99). This commit introduces a new feature for uploading upstream wheel dependencies to Databricks Workspaces without Public Internet access. A new flag has been added to upload functions, allowing users to include or exclude dependencies in the download list. The
WheelsV2class has been updated with a new method,upload_wheel_dependencies(prefixes), which checks if each wheel's name starts with any of the provided prefixes before uploading it to the Workspace File System (WSFS). This feature also includes two new tests to verify the functionality of uploading the main wheel package and dependent wheel packages, optimizing downloads based on specific use cases. This enables users to more easily use the package in offline environments with restricted internet access, particularly for Databricks Workspaces with extra layers of network security. - Fixed bug for double-uploading of unreleased wheels in air-gapped setups (#103). In this release, we have addressed a bug in the
upload_wheel_dependenciesmethod of theWheelsV2class, which caused double-uploading of unreleased wheels in air-gapped setups. This issue occurred due to the conditionif wheel.name == self._local_wheel.namenot being met, resulting in undefined behavior. We have introduced a cached property_current_versionto tackle this bug for unreleased versions uploaded to air-gapped workspaces. We also added a new method,upload_to_wsfs(), that uploads files to the workspace file system (WSFS) in the integration test. This release also includes new tests to ensure that only the Databricks SDK is uploaded and that the number of installation files is correct. These changes have resolved the double-uploading issue, and the number of installation files, Databricks SDK, Blueprint, and version.json metadata are now uploaded correctly to WSFS.
Contributors: @aminmovahed-db, @nfx
v0.5.0
- Added content assertion for
assert_file_uploadedandassert_file_dbfs_uploadedinMockInstallation(#101). The recent commit introduces a content assertion feature to theMockInstallationclass, enhancing its testing capabilities. This is achieved by adding an optionalexpectedparameter of typebytesto theassert_file_uploadedandassert_file_dbfs_uploadedmethods, allowing users to verify the uploaded content's correctness. The_assert_uploadmethod has also been updated to accept this new parameter, ensuring the actual uploaded content matches the expected content. Furthermore, the commit includes informative docstrings for the new and updated methods, providing clear explanations of their functionality and usage. To support these improvements, new test casestest_assert_file_uploadedandtest_load_empty_data_classhave been added to thetests/unit/test_installation.pyfile, enabling more rigorous testing of theMockInstallationclass and ensuring that the expected content is uploaded correctly. - Added handling for partial functions in
parallel.Threads(#93). In this release, we have enhanced theparallel.Threadsmodule with the ability to handle partial functions, addressing issue #93. This improvement includes the addition of a new static method,_get_result_function_signature, to obtain the signature of a function or a string representation of its arguments and keywords if it is a partial function. The_wrap_resultclass method has also been updated to log an error message with the function's signature if an exception occurs. Furthermore, we have added a new test case,test_odd_partial_failed, to the unit tests, ensuring that thegatherfunction handles partial functions that raise errors correctly. The Python version required for this project remains at 3.10, and thepyproject.tomlfile has been updated to include "isort", "mypy", "types-PyYAML", andtypes-requestsin the list of dependencies. These adjustments are aimed at improving the functionality and type checking in theparallel.Threadsmodule. - Align configurations with UCX project (#96). This commit brings project configurations in line with the UCX project through various fixes and updates, enhancing compatibility and streamlining collaboration. It addresses pylint configuration warnings, adjusts GitHub Actions workflows, and refines the
pyproject.tomlfile. Additionally, theNiceFormatterclass inlogger.pyhas been improved for better code readability, and the versioning scheme has been updated to ensure SemVer and PEP440 compliance, making it easier to manage and understand the project's versioning. Developers adopting the project will benefit from these alignments, as they promote adherence to the project's standards and up-to-date best practices. - Check backwards compatibility with UCX, Remorph, and LSQL (#84). This release includes an update to the dependabot configuration to check for daily updates in both the pip and github-actions package ecosystems, with a new directory parameter added for the pip ecosystem for more precise update management. Additionally, a new GitHub Actions workflow, "downstreams", has been added to ensure backwards compatibility with UCX, Remorph, and LSQL by running automated downstream checks on pull requests, merge groups, and pushes to the main branch. The workflow has appropriate permissions for writing id-tokens, reading contents, and writing pull-requests, and runs the downstreams action from the databrickslabs/sandbox repository using GITHUB_TOKEN for authentication. These changes improve the security and maintainability of the project by ensuring compatibility with downstream projects and staying up-to-date with the latest package versions, reducing the risk of potential security vulnerabilities and bugs.
Dependency updates:
- Bump actions/setup-python from 4 to 5 (#89).
- Bump softprops/action-gh-release from 1 to 2 (#87).
- Bump actions/checkout from 2.5.0 to 4.1.2 (#88).
- Bump codecov/codecov-action from 1 to 4 (#85).
- Bump actions/checkout from 4.1.2 to 4.1.3 (#95).
- Bump actions/checkout from 4.1.3 to 4.1.5 (#100).
Contributors: @dependabot[bot], @nfx, @grusin-db, @nkvuong
v0.4.4
- If
Threads.strict()raises just one error, don't wrap it withManyError(#79). Thestrictmethod in thegatherfunction of theparallel.pymodule in thedatabricks/labs/blueprintpackage has been updated to change the way it handles errors. Previously, if any task in thetaskssequence failed, thestrictmethod would raise aManyErrorexception containing all the errors. With this change, if only one error occurs, that error will be raised directly without being wrapped in aManyErrorexception. This simplifies error handling and avoids unnecessary nesting of exceptions. Additionally, the__tracebackhide__dunder variable has been added to the method to improve the readability of tracebacks by hiding it from the user. This update aims to provide a more streamlined and user-friendly experience for handling errors in parallel processing tasks.
Contributors: @nfx
v0.4.3
- Fixed marshalling & unmarshalling edge cases (#76). The serialization and deserialization methods in the code have been updated to improve handling of edge cases during marshalling and unmarshalling of data. When encountering certain edge cases, the
_marshal_listmethod will now return an empty list instead of None, and both the_unmarshaland_unmarshal_dictmethods will return None as is if the input is None. Additionally, the_unmarshalmethod has been updated to call_unmarshal_genericinstead of checking if the type reference is a dictionary or list when it is a generic alias. The_unmarshal_genericmethod has also been updated to handle cases where the input is None. A new test case,test_load_empty_data_class(), has been added to thetests/unit/test_installation.pyfile to verify this behavior, ensuring that the correct behavior is maintained when encountering these edge cases during the marshalling and unmarshalling processes. These changes increase the reliability of the serialization and deserialization processes.
Contributors: @nkvuong
v0.4.2
- Fixed edge cases when loading typing.Dict, typing.List and typing.ClassVar (#74). In this release, we have implemented changes to improve the handling of edge cases related to the Python
typing.Dict,typing.List, andtyping.ClassVarduring serialization and deserialization of dataclasses and generic types. Specifically, we have modified the_marshaland_unmarshalfunctions to check for the__origin__attribute to determine whether the type is aClassVarand skip it if it is. The_marshal_dataclassand_unmarshal_dataclassfunctions now check for the__dataclass_fields__attribute to ensure that only dataclass fields are marshaled and unmarshaled. We have also added a new unit test for loading a complex data class using theMockInstallationclass, which contains various attributes such as a string, a nested dictionary, a list ofPolicyobjects, and a dictionary mapping string keys toPolicyobjects. This test case checks that the installation object correctly serializes and deserializes theComplexClassinstance to and from JSON format according to the specified attribute types, including handling of thetyping.Dict,typing.List, andtyping.ClassVartypes. These changes improve the reliability and robustness of our library in handling complex data types defined in thetypingmodule. MockPrompts.extend()now returns a copy (#72). In the latest release, theextend()method in theMockPromptsclass of thetui.pymodule has been enhanced. Previously,extend()would modify the originalMockPromptsobject, which could lead to issues when reusing the same object in multiple places within the same test, as its state would be altered each timeextend()was called. This has been addressed by updating theextend()method to return a copy of theMockPromptsobject with the updated patterns and answers, instead of modifying the original object. This change ensures that the originalMockPromptsobject can be securely reused in multiple test scenarios without unintended side effects, preserving the integrity of the original state. Furthermore, additional tests have been incorporated to verify the correct behavior of both the new and original prompts.
Contributors: @pritishpai, @nkvuong
v0.4.1
- Fixed
MockInstallationto emulate workspace-global setup (#69). In this release, theMockInstallationclass in theinstallationmodule has been updated to better replicate a workspace-global setup, enhancing testing and development accuracy. Theis_globalmethod now utilizes theproductmethod instead of_product, and a new instance variable_is_globalwith a default value ofTrueis introduced in the__init__method. Moreover, a newproductmethod is included, which consistently returns the string "mock". These enhancements resolve issue #69, "FixedMockInstallationto emulate workspace-global setup", ensuring theMockInstallationinstance behaves as a global installation, facilitating precise and reliable testing and development for our software engineering team. - Improved
MockPromptswithextend()method (#68). In this release, we've added anextend()method to theMockPromptsclass in our library's TUI module. This new method allows developers to add new patterns and corresponding answers to the existing list of questions and answers in aMockPromptsobject. The added patterns are compiled as regular expressions and the questions and answers list is sorted by the length of the regular expression patterns in descending order. This feature is particularly useful for writing tests where prompt answers need to be changed, as it enables better control and customization of prompt responses during testing. By extending the list of questions and answers, you can handle additional prompts without modifying the existing ones, resulting in more organized and maintainable test code. If a prompt hasn't been mocked, attempting to ask a question with it will raise aValueErrorwith an appropriate error message. - Use Hatch v1.9.4 to as build machine requirement (#70). The Hatch package version for the build machine requirement has been updated from 1.7.0 to 1.9.4 in this change. This update streamlines the Hatch setup and version management, removing the specific installation step and listing
hatchdirectly in the required field. The pre-setup command now only includes "hatch env create". Additionally, the acceptance tool version has been updated to ensure consistent project building and testing with the specified Hatch version. This change is implemented in the acceptance workflow file and the version of the acceptance tool used by the sandbox. This update ensures that the project can utilize the latest features and bug fixes available in Hatch 1.9.4, improving the reliability and efficiency of the build process. This change is part of the resolution of issue #70.
Contributors: @nfx, @pritishpai
v0.4.0
- Added commands with interactive prompts (#66). This commit introduces a new feature in the Databricks Labs project to support interactive prompts in the command-line interface (CLI) for enhanced user interactivity. The
Promptsargument, imported fromdatabricks.labs.blueprint.tui, is now integrated into the@app.commanddecorator, enabling the creation of commands with user interaction like confirmation prompts. An example of this is themecommand, which confirms whether the user wants to proceed before displaying the current username. The commit also refactored the code to make it more efficient and maintainable, removing redundancy in creating client instances. TheAccountClientandWorkspaceClientinstances can now be provided automatically with the product name and version. These changes improve the CLI by making it more interactive, user-friendly, and adaptable to various use cases while also optimizing the codebase for better efficiency and maintainability. - Added more code documentation (#64). This release introduces new features and updates to various files in the open-source library. The
cli.pyfile in thesrc/databricks/labs/blueprintdirectory has been updated with a new decorator,command, which registers a function as a command. Theentrypoint.pyfile in thedatabricks.labs.blueprintmodule now includes a module-level docstring describing its purpose, as well as documentation for the various standard libraries it imports. TheInstallationclass in theinstallers.pyfile has new methods for handling files, such asload,load_or_default,upload,load_local, andfiles. Theinstallers.pyfile also includes a newInstallationStatedataclass, which is used to track installations. Thelimiter.pyfile now includes code documentation for theRateLimiterclass and therate_limiteddecorator, which are used to limit the rate of requests. Thelogger.pyfile includes a newNiceFormatterclass, which provides a nicer format for logging messages with colors and bold text if the console supports it. Theparallel.pyfile has been updated with new methods for running tasks in parallel and returning results and errors. TheTUI.pyfile has been documented, and includes imports for logging, regular expressions, and collections abstract base class. Lastly, theupgrades.pyfile has been updated with additional code documentation and new methods for loading and applying upgrade scripts. Overall, these changes improve the functionality, maintainability, and usability of the open-source library. - Fixed init-project command (#65). In this release, the
init-projectcommand has been improved with several bug fixes and new functionalities. A new import statement for thesysmodule has been added, and adocsdirectory is now included in the copied directories and files during initialization. Theinit_projectfunction has been updated to open files using the default system encoding, ensuring proper reading and writing of file contents. Therelative_pathsfunction in theentrypoint.pyfile now returns absolute paths if the common path is the root directory, addressing issue #41. Additionally, several test functions have been added totests/unit/test_entrypoint.py, enhancing the reliability and robustness of theinit-projectcommand by providing comprehensive tests for supporting functions. Overall, these changes significantly improve the functionality and reliability of theinit-projectcommand, ensuring a more consistent and accurate project initialization process. - Using
ProductInfowith integration tests (#63). In this update, theProductInfoclass has been enhanced with a new class methodfor_testing(klass)to facilitate effective integration testing. This method generates a newProductInfoobject with a randomproduct_name, enabling the creation of distinct installation directories for each test execution. Prior to this change, conflicts and issues could arise when multiple test executions shared the same integration test folder. With the introduction of this new method, developers can now ensure that their integration tests run with unique product names and separate installation directories, enhancing testing isolation and accuracy. This update is demonstrated in the provided code snippet and includes a new test case to confirm the generation of unique product names. Furthermore, a pre-existing test case has been modified to provide a more specific error message related to theSingleSourceVersionError. This enhancement aims to improve the integration testing capabilities of the codebase and is designed to be easily adopted by other software engineers utilizing this project.
Contributors: @nfx
v0.3.1
- Fixed the order of marshal to handle Dataclass with as_dict before other types to avoid SerdeError (#60). In this release, we have addressed an issue that caused a SerdeError during the installation.save operation with a Dataclass object. The error was due to the order of evaluation in the _marshal_dataclass method. The order has been updated to evaluate the
as_dictmethod first if it exists in the Dataclass, which resolves the SerdeError. To ensure the correctness of the fix, we have added a new test_data_class function that tests the save and load functionality with a Dataclass object. The test defines a Policy Dataclass with anas_dictmethod that returns a dictionary representation of the object and checks if the file is written correctly and if the loaded object matches the original object. This change has been thoroughly unit tested to ensure that it works as expected.
Contributors: @HariGS-DB
v0.3.0
- Added automated upgrade framework (#50). This update introduces an automated upgrade framework for managing and applying upgrades to the product, with a new
upgrades.pyfile that includes aProductInfoclass having methods for version handling, wheel building, and exception handling. The test code organization has been improved, and new test cases, functions, and a directory structure for fixtures and unit tests have been added for the upgrades functionality. Thetest_wheels.pyfile now checks the version of the Databricks SDK and handles cases where the version marker is missing or does not contain the__version__variable. Additionally, a newApplication State Migrationssection has been added to the README, explaining the process of seamless upgrades from version X to version Z through version Y, addressing the need for configuration or database state migrations as the application evolves. Users can apply these upgrades by following an idiomatic usage pattern involving several classes and functions. Furthermore, improvements have been made to the_trim_leading_whitespacefunction in thecommands.pyfile of thedatabricks.labs.blueprintmodule, ensuring accurate and consistent removal of leading whitespace for each line in the command string, leading to better overall functionality and maintainability. - Added brute-forcing
SerdeErrorwithas_dict()andfrom_dict()(#58). This commit introduces a brute-forcing approach for handlingSerdeErrorusingas_dict()andfrom_dict()methods in an open-source library. The newSomePolicyclass demonstrates the usage of these methods for manual serialization and deserialization of custom classes. Theas_dict()method returns a dictionary representation of the class instance, and thefrom_dict()method, decorated with@classmethod, creates a new instance from the provided dictionary. Additionally, the GitHub Actions workflow for acceptance tests has been updated to include theready_for_reviewevent type, ensuring that tests run not only for opened and synchronized pull requests but also when marked as "ready for review." These changes provide developers with more control over the deserialization process and facilitate debugging in cases where default deserialization fails, but should be used judiciously to avoid brittle code. - Fixed nightly integration tests run as service principals (#52). In this release, we have enhanced the compatibility of our codebase with service principals, particularly in the context of nightly integration tests. The
Installationclass in thedatabricks.labs.blueprint.installationmodule has been refactored, deprecating thecurrentmethod and introducing two new methods:assume_globalandassume_user_home. These methods enable users to install and manageblueprintas either a global or user-specific installation. Additionally, theexistingmethod has been updated to work with the newInstallationmethods. In the test suite, thetest_installation.pyfile has been updated to correctly detect global and user-specific installations when running as a service principal. These changes improve the testability and functionality of our software, ensuring seamless operation with service principals during nightly integration tests. - Made
test_existing_installations_are_detectedmore resilient (#51). In this release, we have added a new test functiontest_existing_installations_are_detectedthat checks if existing installations are correctly detected and retries the test for up to 15 seconds if they are not. This improves the reliability of the test by making it more resilient to potential intermittent failures. We have also added an import fromdatabricks.sdk.retriesnamedretriedwhich is used to retry the test function in case of anAssertionError. Additionally, the test functiontest_existinghas been renamed totest_existing_installations_are_detectedand thexfailmarker has been removed. We have also renamed the test functiontest_dataclasstotest_loading_dataclass_from_installationfor better clarity. This change will help ensure that the library is correctly detecting existing installations and improve the overall quality of the codebase.
Contributors: @nfx