-
Notifications
You must be signed in to change notification settings - Fork 114
feat(csharp/src/Drivers/Apache/Spark): Add prefetch for direct result #2666
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat(csharp/src/Drivers/Apache/Spark): Add prefetch for direct result #2666
Conversation
…nto add_prefetch_for_directResult
…prefetch_for_directResult
@eric-wang-1990 - I tried this for the Apache Spark reader and had to revert. I'll test your changes to see if there is and issue with interleaved use of the buffer using the the Thrift library. |
As I suspected, I am able to reproduce the problem ...
From PR #2273
I created a ticket with Apache Thrift |
@birschick-bq, do I understand correctly that there's no reliable or safe way to get this ~40% perf improvement because of limitations in the Thrift library? What would we need to do to work around it, open multiple connections to the server? |
@CurtHagenlocher / @eric-wang-1990 Yes, the limitation is in the Thrift library. The Transport layer is allocating/disposing the buffers independent of the higher level call. When the FetchNext await the results (buffer) it should have, they may have been destroyed by the next interleaved call. My idea to solve this problem, is to have the buffers allocated/dispose at the Client layer and passed to the Transport layer. They would be created before the For example, currently the generated code for FetchResult would change to look something like this ... public async Task<TFetchResultsResp> FetchResults(TFetchResultsReq @req, CancellationToken cancellationToken = default)
{
using TransportBuffer inputBuffer = this.InputProtocol.Transport.AllocateBuffer();
await send_FetchResults(@req, buffer: inputBuffer, cancellationToken);
using TransportBuffer outputBuffer = this.OutputProtocol.Transport.AllocateBuffer();
return await recv_FetchResults(buffer: outputBuffer, cancellationToken);
} The challenge is changing all the code along the way to pass the buffers through the layers. |
I know this one still needs some work, so just putting this out there, but I think this should wait on the decision about #2672 and then move to the Databricks driver. |
This PR adds prefetch function for directResults.


While the application is processing the current batch, the driver can start getting the next batch.
Perf gain tested with Power BI Desktop:
ADBC without Prefetch:
ADBC with Prefetch: