-
Notifications
You must be signed in to change notification settings - Fork 139
feat(csharp/src/Drivers/Apache): Add prefetch functionality to CloudFetch in Spark ADBC driver #2678
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
01daf70
to
a388213
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I'm still reviewing this logic but thought I'd give some initial feedback.
Also, please take a look at the linter output and make changes accordingly.
csharp/src/Drivers/Databricks/CloudFetch/cloudfetch-pipeline-design.md
Outdated
Show resolved
Hide resolved
csharp/src/Drivers/Databricks/CloudFetch/IHiveServer2Statement.cs
Outdated
Show resolved
Hide resolved
csharp/src/Drivers/Databricks/CloudFetch/ICloudFetchInterfaces.cs
Outdated
Show resolved
Hide resolved
csharp/src/Drivers/Databricks/CloudFetch/CloudFetchDownloadManager.cs
Outdated
Show resolved
Hide resolved
csharp/src/Drivers/Databricks/CloudFetch/CloudFetchDownloadManager.cs
Outdated
Show resolved
Hide resolved
csharp/src/Drivers/Databricks/CloudFetch/CloudFetchDownloadManager.cs
Outdated
Show resolved
Hide resolved
6412369
to
6b733bc
Compare
Update DatabricksParameters.cs address comments fix linter rebase to master refactor to fix unit test refactor some code refactoring refactor Delete CloudFetchDownloadManagerTest.cs Initital changes
6b733bc
to
c50c73d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Looks great!
…etch in Spark ADBC driver (apache#2678) # Add Prefetch Functionality to CloudFetch in Spark ADBC Driver This PR enhances the CloudFetch feature in the Spark ADBC driver by implementing prefetch functionality, which improves performance by fetching multiple batches of results ahead of time. ## Changes ### CloudFetchResultFetcher Enhancements - **Initial Prefetch**: Added code to perform an initial prefetch of multiple batches when the fetcher starts, ensuring data is available immediately when needed. - **State Management**: Added tracking for current batch offset and size, with proper state reset when starting the fetcher. ### Interface Updates - Added new methods to `ICloudFetchResultFetcher` interface: ### Testing Infrastructure - Created `ITestableHiveServer2Statement` interface to facilitate testing - Updated tests to account for prefetch behavior - Ensured all tests pass with the new prefetch functionality ## Benefits - **Improved Performance**: By prefetching multiple batches, data is available sooner, reducing wait times. - **Better Reliability**: Enhanced error handling and state management make the system more robust. - **More Efficient Resource Usage**: Link caching reduces unnecessary server requests. This implementation maintains backward compatibility while providing significant performance improvements for CloudFetch operations.
Add Prefetch Functionality to CloudFetch in Spark ADBC Driver
This PR enhances the CloudFetch feature in the Spark ADBC driver by implementing prefetch functionality, which improves performance by fetching multiple batches of results ahead of time.
Changes
CloudFetchResultFetcher Enhancements
Interface Updates
ICloudFetchResultFetcher
interface:Testing Infrastructure
ITestableHiveServer2Statement
interface to facilitate testingBenefits
This implementation maintains backward compatibility while providing significant performance improvements for CloudFetch operations.