feat(csharp/src/Drivers/Apache): Add prefetch functionality to CloudFetch in Spark ADBC driver #2678

jadewang-db · 2025-04-07T21:26:50Z

Add Prefetch Functionality to CloudFetch in Spark ADBC Driver

This PR enhances the CloudFetch feature in the Spark ADBC driver by implementing prefetch functionality, which improves performance by fetching multiple batches of results ahead of time.

Changes

CloudFetchResultFetcher Enhancements

Initial Prefetch: Added code to perform an initial prefetch of multiple batches when the fetcher starts, ensuring data is available immediately when needed.
State Management: Added tracking for current batch offset and size, with proper state reset when starting the fetcher.

Interface Updates

Added new methods to ICloudFetchResultFetcher interface:

Testing Infrastructure

Created ITestableHiveServer2Statement interface to facilitate testing
Updated tests to account for prefetch behavior
Ensured all tests pass with the new prefetch functionality

Benefits

Improved Performance: By prefetching multiple batches, data is available sooner, reducing wait times.
Better Reliability: Enhanced error handling and state management make the system more robust.
More Efficient Resource Usage: Link caching reduces unnecessary server requests.

This implementation maintains backward compatibility while providing significant performance improvements for CloudFetch operations.

CurtHagenlocher

Thanks! I'm still reviewing this logic but thought I'd give some initial feedback.

Also, please take a look at the linter output and make changes accordingly.

csharp/src/Drivers/Databricks/CloudFetch/cloudfetch-pipeline-design.md

csharp/src/Drivers/Databricks/CloudFetch/IHiveServer2Statement.cs

csharp/src/Drivers/Databricks/CloudFetch/ICloudFetchInterfaces.cs

csharp/src/Drivers/Databricks/DatabricksParameters.cs

csharp/src/Drivers/Databricks/CloudFetch/CloudFetchReader.cs

csharp/src/Drivers/Databricks/CloudFetch/CloudFetchDownloadManager.cs

csharp/src/Drivers/Databricks/CloudFetch/EndOfResultsGuard.cs

csharp/src/Drivers/Databricks/CloudFetch/ICloudFetchInterfaces.cs

csharp/src/Drivers/Databricks/CloudFetch/CloudFetchDownloadManager.cs

Update DatabricksParameters.cs address comments fix linter rebase to master refactor to fix unit test refactor some code refactoring refactor Delete CloudFetchDownloadManagerTest.cs Initital changes

CurtHagenlocher

Thanks! Looks great!

…etch in Spark ADBC driver (apache#2678) # Add Prefetch Functionality to CloudFetch in Spark ADBC Driver This PR enhances the CloudFetch feature in the Spark ADBC driver by implementing prefetch functionality, which improves performance by fetching multiple batches of results ahead of time. ## Changes ### CloudFetchResultFetcher Enhancements - **Initial Prefetch**: Added code to perform an initial prefetch of multiple batches when the fetcher starts, ensuring data is available immediately when needed. - **State Management**: Added tracking for current batch offset and size, with proper state reset when starting the fetcher. ### Interface Updates - Added new methods to `ICloudFetchResultFetcher` interface: ### Testing Infrastructure - Created `ITestableHiveServer2Statement` interface to facilitate testing - Updated tests to account for prefetch behavior - Ensured all tests pass with the new prefetch functionality ## Benefits - **Improved Performance**: By prefetching multiple batches, data is available sooner, reducing wait times. - **Better Reliability**: Enhanced error handling and state management make the system more robust. - **More Efficient Resource Usage**: Link caching reduces unnecessary server requests. This implementation maintains backward compatibility while providing significant performance improvements for CloudFetch operations.

jadewang-db requested a review from CurtHagenlocher as a code owner April 7, 2025 21:26

github-actions bot added this to the ADBC Libraries 18 milestone Apr 7, 2025

jadewang-db force-pushed the cloudfetch-pipeline branch 2 times, most recently from 01daf70 to a388213 Compare April 14, 2025 19:49

CurtHagenlocher requested changes Apr 16, 2025

View reviewed changes

jadewang-db requested a review from CurtHagenlocher April 21, 2025 21:56

jadewang-db force-pushed the cloudfetch-pipeline branch 2 times, most recently from 6412369 to 6b733bc Compare April 22, 2025 00:36

CurtHagenlocher mentioned this pull request Apr 22, 2025

feat(csharp/src/Drivers/Apache/Spark): Add prefetch for direct result #2666

Closed

Cloudfetch prefetch

c50c73d

Update DatabricksParameters.cs address comments fix linter rebase to master refactor to fix unit test refactor some code refactoring refactor Delete CloudFetchDownloadManagerTest.cs Initital changes

jadewang-db force-pushed the cloudfetch-pipeline branch from 6b733bc to c50c73d Compare April 22, 2025 20:20

alexguo-db mentioned this pull request Apr 23, 2025

feat(csharp/src/Drivers/Databricks): Add option to enable using direct results for statements #2737

Merged

CurtHagenlocher approved these changes Apr 24, 2025

View reviewed changes

CurtHagenlocher merged commit 7f3d33b into apache:main Apr 24, 2025
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(csharp/src/Drivers/Apache): Add prefetch functionality to CloudFetch in Spark ADBC driver #2678

feat(csharp/src/Drivers/Apache): Add prefetch functionality to CloudFetch in Spark ADBC driver #2678

Uh oh!

jadewang-db commented Apr 7, 2025 •

edited

Loading

Uh oh!

CurtHagenlocher left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

CurtHagenlocher left a comment

Uh oh!

Uh oh!

Uh oh!

feat(csharp/src/Drivers/Apache): Add prefetch functionality to CloudFetch in Spark ADBC driver #2678

feat(csharp/src/Drivers/Apache): Add prefetch functionality to CloudFetch in Spark ADBC driver #2678

Uh oh!

Conversation

jadewang-db commented Apr 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Add Prefetch Functionality to CloudFetch in Spark ADBC Driver

Changes

CloudFetchResultFetcher Enhancements

Interface Updates

Testing Infrastructure

Benefits

Uh oh!

CurtHagenlocher left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

CurtHagenlocher left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jadewang-db commented Apr 7, 2025 •

edited

Loading