fix(csharp): always send CloseOperation from DatabricksCompositeReader.Dispose#360
Conversation
…r.Dispose Previously, DatabricksCompositeReader.Dispose only called CloseOperation when _activeReader was null. When a CloudFetchReader was active, it delegated Dispose to the reader, but CloudFetchReader is protocol-agnostic and never sends CloseOperation. This orphaned every CloudFetch server operation for ~1 hour until SQL Gateway fired CommandInactivityTimeout, producing thriftOperationCloseReason=CommandInactivityTimeout in usage logs. Move CloseOperation ownership entirely to DatabricksCompositeReader.Dispose, which holds both _statement (Thrift client) and _response (operation handle). HiveServer2Reader.CloseOperationAsync is already a no-op when DirectResults already closed the operation server-side, so all three result paths are safe: - Inline + DirectResults enabled: CloseOperation is a no-op (already closed) - Inline + DirectResults disabled: CloseOperation sent explicitly - CloudFetch: CloseOperation sent explicitly (was previously missing) Remove CloseOperation from DatabricksReader.Dispose to avoid duplicate calls; DatabricksReader is only ever constructed from DatabricksCompositeReader.
| else | ||
| // Always close the operation here at the composite level. | ||
| // CloudFetchReader is protocol-agnostic and does not send CloseOperation, | ||
| // so we must not rely on the contained reader to do it. |
There was a problem hiding this comment.
should we call dispose?
There was a problem hiding this comment.
✅ Done — added CloseOperationE2ETest in csharp/test/E2E/CloseOperationE2ETest.cs (latest commit). It covers all three result delivery modes (Inline+DirectResults, Inline+NoDirectResults, CloudFetch) using an ActivityListener to assert the composite_reader.close_operation trace event is emitted during Dispose. Verified: all 3 tests fail when the fix is reverted, and pass with the fix in place.
There was a problem hiding this comment.
Yes — _activeReader.Dispose() is still called (lines 251-252), after CloseOperationAsync. The change is that CloseOperation is now always called unconditionally at the composite level first, regardless of whether _activeReader is null. Then the active reader is disposed for its own resource cleanup (HTTP connections, download buffers, etc.). This separates the Thrift protocol cleanup from the reader's internal resource cleanup.
…ksCompositeReader.Dispose Add CloseOperationE2ETest with 3 theory cases covering all result delivery modes (Inline+DirectResults, Inline+NoDirectResults, CloudFetch). Each test disposes only the reader — not the connection — to simulate connection pooling, then uses ActivityListener to assert the composite_reader.close_operation trace event is emitted inside DatabricksCompositeReader.Dispose. This event is only present after the fix. Without it: - Inline paths: event missing because _activeReader != null caused the old code to delegate to _activeReader.Dispose() without calling CloseOperation. - CloudFetch: same delegation but CloseOperationAsync is never called at all, orphaning the server operation for ~1 hour (thriftOperationCloseReason=CommandInactivityTimeout). For Thrift wire-level assertions, see proxy-based tests in databricks/databricks-driver-test: CLOUDFETCH-013 through CLOUDFETCH-016.
…oseOperationE2ETest - Replace "modified" license header with plain Apache 2.0 header (RAT check) - Replace ConcurrentBag with locked List<T> for .NET 4.7.2 compatibility (ConcurrentBag.Clear() was added in .NET 5; Windows CI builds net472)
Problem
DatabricksCompositeReader.Disposeonly calledCloseOperationwhen_activeReaderwasnull. When aCloudFetchReaderwas active, it delegatedDisposeto the reader — butCloudFetchReaderis protocol-agnostic (it downloads from cloud storage over HTTP) and never sendsCloseOperationto the Thrift endpoint.Result: every CloudFetch query orphaned the server-side operation for ~1 hour, until SQL Gateway's
driver-routerdetected inactivity and firedCommandInactivityTimeout. This producedthriftOperationCloseReason=CommandInactivityTimeoutin usage logs for all CloudFetch queries in connection-pooled scenarios (whereCloseSessionis never sent).Root cause
The bug was introduced in commit
0039ee3— "refactor(csharp): make CloudFetch pipeline protocol-agnostic (#14)".Before that commit:
BaseDatabricksReaderhad a publicCloseOperationAsync()method that sent the Thrift RPCDatabricksCompositeReader.Disposecalled_activeReader.CloseOperationAsync()— correct for bothDatabricksReaderandCloudFetchReader(both inherited fromBaseDatabricksReader)The protocol-agnostic refactor made three changes that together caused the regression:
CloseOperationAsync()fromBaseDatabricksReaderDatabricksCompositeReader.Disposeto call_activeReader.Dispose()instead of_activeReader.CloseOperationAsync(), with the comment "Have the contained reader close the operation to avoid duplicate calls"CloudFetchReaderprotocol-agnostic (no Thrift dependency) — itsDisposeonly cleans up HTTP/download resources and never sendsCloseOperationDatabricksReadergot its ownCloseOperationAsync()called fromDispose, so inline results remained correct. But the CloudFetch path silently lost its cleanup.Fix
Move
CloseOperationownership entirely toDatabricksCompositeReader.Dispose, which holds both_statement(Thrift client) and_response(operation handle).HiveServer2Reader.CloseOperationAsyncalready handles the DirectResults case correctly — it is a no-op when the server already closed the operation inline.Remove
CloseOperationfromDatabricksReader.Disposeto avoid a duplicate call;DatabricksReaderis only ever constructed fromDatabricksCompositeReader.CreateDatabricksReader.Behavior after fix
CloseOperationAsyncis a no-op)Testing
Validated with 4 new proxy-based regression tests in
databricks/databricks-driver-test(CLOUDFETCH-013 through 016) that simulate connection pooling (reader disposed without closing the connection) and assert the correct number ofCloseOperationThrift calls for each path.