Skip to content

feat(csharp/src/Drivers/Databricks): Make Cloud Fetch options configurable at the connection level #2691

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

alexguo-db
Copy link
Contributor

@alexguo-db alexguo-db commented Apr 9, 2025

  • Currently, cloud fetch enablement is only configurable at the statement level, and this is not exposed to clients outside of arrow-adbc repo
  • Make these options configurable at the driver/connection level

Tested E2E by adding a log for the result set format and whether lz4 compression is enabled

  • { "adbc.databricks.cloudfetch.enabled", "false" }

    • DatabricksConnection.NewReader: resultFormat=ARROW_BASED_SET, isLz4Compressed=True
  • { "adbc.databricks.cloudfetch.enabled", "false" }, { "adbc.databricks.cloudfetch.lz4.enabled", "false" },

    • DatabricksConnection.NewReader: resultFormat=ARROW_BASED_SET, isLz4Compressed=False
  • empty

    • DatabricksConnection.NewReader: resultFormat=URL_BASED_SET, isLz4Compressed=True
  • { "adbc.databricks.cloudfetch.lz4.enabled", "false" },

    • DatabricksConnection.NewReader: resultFormat=URL_BASED_SET, isLz4Compressed=False

Also tested with
dotnet test --filter "FullyQualifiedName~CloudFetchE2ETest"

@alexguo-db alexguo-db marked this pull request as ready for review April 9, 2025 19:52
@github-actions github-actions bot added this to the ADBC Libraries 18 milestone Apr 9, 2025
@alexguo-db alexguo-db changed the title feat(csharp/src/Drivers/Databricks): Make Cloud Fetch options configurable at the connector level feat(csharp/src/Drivers/Databricks): Make Cloud Fetch options configurable at the connection level Apr 9, 2025
if (Properties.TryGetValue(DatabricksParameters.UseCloudFetch, out string? useCloudFetchStr) &&
bool.TryParse(useCloudFetchStr, out bool useCloudFetchValue))
{
_useCloudFetch = useCloudFetchValue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider validating the string property. Right now, if someone were to typo as e.g. properties["CloudFetch"] = "flase"; the property value would be silently ignored. (Similarly, properties["CloudFetc"] = "false"; would be ignored but I'm not sure how consistently drivers are producing errors for unsupported properties.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unfortunate that we're essentially validating these in two places: both here and in DatabricksStatement.

We've also somehow arrived in a weird place where these values are both connection properties (that can only be set before the connection is opened) and statement properties that can be set any time after the statement was created. But that's a preexisting condition.

Copy link
Contributor

@CurtHagenlocher CurtHagenlocher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! The change looks fine overall. The preexisting way that properties are handled in the HiveServer2-based drivers seems decidedly non-ideal, but that can't be blamed on this PR. I do think we should validate the boolean parsing by throwing an exception when parsing fails; that would be both a better user experience and more consistent with the rest of the code.

if (Properties.TryGetValue(DatabricksParameters.UseCloudFetch, out string? useCloudFetchStr) &&
bool.TryParse(useCloudFetchStr, out bool useCloudFetchValue))
{
_useCloudFetch = useCloudFetchValue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unfortunate that we're essentially validating these in two places: both here and in DatabricksStatement.

We've also somehow arrived in a weird place where these values are both connection properties (that can only be set before the connection is opened) and statement properties that can be set any time after the statement was created. But that's a preexisting condition.

@alexguo-db alexguo-db force-pushed the alex-guo_data/make-cloudfetch-driver-configurable branch from e7488d5 to 44f6853 Compare April 17, 2025 22:38
Copy link
Contributor

@CurtHagenlocher CurtHagenlocher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Unfortunately, another change appears to have introduced a merge conflict. I'll check this in once that's resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants