fix(csharp): prevent infinite polling in SEA metadata queries#347
fix(csharp): prevent infinite polling in SEA metadata queries#347msrathore-db wants to merge 2 commits intomainfrom
Conversation
…nt infinite polling ExecuteMetadataSqlAsync creates internal statements with _queryTimeoutSeconds=0 (no timeout). When called via ExecuteQuery() on a metadata statement (e.g. GetTables, GetColumns), CancellationToken.None is passed through, causing PollUntilCompleteAsync's while(true) loop to run indefinitely if the server query doesn't complete within the initial 10s wait. This impacts PowerBI users who see frozen UI / infinite spinners when browsing tables or columns via the SEA (REST) protocol, with no way to cancel. Fix: wrap all metadata SQL execution with CreateMetadataTimeoutCts(), matching the pattern already used by GetObjects(). Links caller-provided tokens so either the caller or the metadata timeout can terminate the operation. Co-authored-by: Isaac
| // ExecuteMetadataCommandAsync → GetTablesAsync via ExecuteQuery()), the | ||
| // inner statement would have _queryTimeoutSeconds=0 and poll forever. | ||
| // Link the caller's token with a metadata timeout as a safety net. | ||
| using var metadataTimeoutCts = CreateMetadataTimeoutCts(); |
There was a problem hiding this comment.
It seems we have same bug for Thrift, can we fix that too?
There was a problem hiding this comment.
Good question! I investigated this and Thrift is not affected by this bug. Here's why:
-
Thrift metadata calls use native RPC (e.g.,
TGetTablesReq) — a single request-response call, not SQL execution with polling. -
Databricks always returns
DirectResultsinline in the Thrift metadata response. TheDatabricksConnectionoverridesGetResultSetMetadataAsyncandGetRowSetAsync(DatabricksConnection.cs L923-927) to extract results directly fromresponse.DirectResults!— never reachingPollForResponseAsync. -
The bug is SEA-specific because SEA metadata commands (e.g.,
SHOW TABLES) useExecuteMetadataSqlAsync→StatementExecutionStatement→PollUntilCompleteAsync(awhile(true)loop). When called withCancellationToken.Noneand_queryTimeoutSeconds=0, this loop runs forever. Thrift's path never enters this code.
The base HiveServer2Connection.PollForResponseAsync (L733) does have a polling loop, but it's only used by non-Databricks drivers (via HiveServer2ExtendedConnection) and for statement execution — not for Databricks metadata calls.
Summary
ExecuteMetadataSqlAsynccreates internal statements with_queryTimeoutSeconds=0(no timeout). When called viaExecuteQuery()on a metadata statement (e.g.GetTables,GetColumns,GetCatalogs),CancellationToken.Noneis passed through, causingPollUntilCompleteAsync'swhile(true)loop to run indefinitely if the server query doesn't complete within the initial 10s REST API wait.CreateMetadataTimeoutCts()(using_waitTimeoutSeconds, default 10s), matching the pattern already used byGetObjects(). Links caller-provided tokens so either the caller or the metadata timeout can terminate the operation.TGetTablesReqRPC — a single request-response call with no SQL or polling.Root cause details
The code path:
ExecuteQuery()→ExecuteQueryAsync(CancellationToken.None)→ExecuteMetadataCommandAsync→GetTablesAsync(CancellationToken.None)→ExecuteMetadataSqlAsync(sql, CancellationToken.None)→ creates newStatementExecutionStatementwith_queryTimeoutSeconds=0→PollWithTimeoutAsyncsees<= 0→PollUntilCompleteAsync(statementId, CancellationToken.None)→ infinitewhile(true).All SEA metadata commands are affected:
GetCatalogs,GetSchemas,GetTables,GetColumns,GetColumnsExtended,GetPrimaryKeys,GetCrossReference, andGetCurrentCatalog()(SELECT CURRENT_CATALOG()).Test plan
SeaMetadataE2ETests)This pull request was AI-assisted by Isaac.