fix(csharp): fix SEA protocol parity issues for metadata, errors, and catalog handling#338
fix(csharp): fix SEA protocol parity issues for metadata, errors, and catalog handling#338msrathore-db merged 10 commits intomainfrom
Conversation
… catalog handling Fixes multiple SEA driver issues found during E2E test parity testing with Thrift protocol: - Map HTTP 401/403 to AdbcStatusCode.Unauthorized in SEA client - Wrap non-ADBC exceptions in DatabricksException during OpenAsync - Handle SPARK catalog alias (SPARK → null) at connection setup, matching Thrift's HandleSparkCatalog behavior - Fix EffectiveCatalog to pass null through (all catalogs) when SPARK is explicitly requested, instead of falling back to connection default - Fix SHOW SCHEMAS IN ALL CATALOGS column order (databaseName, catalog) not (catalog, databaseName) as previously assumed - Remove catalog filter from GetCatalogs (Thrift has no filter either) - Lowercase GetObjects patterns for case-insensitive matching - Fix Scale/Precision null-setting: null Scale for CHAR/VARCHAR types (only DECIMAL/NUMERIC should retain Scale) - Add SetOption support for TraceParent on both connection and statement - Default ExecuteUpdate AffectedRows to -1 when manifest has no count - Set _metadataCatalogName from connection default in statement constructor Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| && typeCode != (short)HiveServer2Connection.ColumnTypeId.NVARCHAR | ||
| && typeCode != (short)HiveServer2Connection.ColumnTypeId.LONGVARCHAR | ||
| && typeCode != (short)HiveServer2Connection.ColumnTypeId.LONGNVARCHAR) | ||
| bool isDecimalOrNumeric = typeCode == (short)HiveServer2Connection.ColumnTypeId.DECIMAL |
There was a problem hiding this comment.
Why we need to set these to null after PopulateTableInfoFromTypeName is already processed?
Can we just set everything right in PopulateTableInfoFromTypeName?
There was a problem hiding this comment.
Valid suggestion. PopulateTableInfoFromTypeName sets non-null values for ALL types (e.g., Scale=0 for BOOLEAN, Scale=7 for FLOAT). But Thrift's SparkConnection.SetPrecisionScaleAndTypeName only sets Precision/Scale for DECIMAL/NUMERIC/CHAR/VARCHAR types — everything else gets null.
The post-processing null-setting was the original design to match Thrift parity. The cleaner fix would be to update PopulateTableInfoFromTypeName itself, but it's in ColumnMetadataHelper which is also used by FlatColumnsResultBuilder (GetColumns path) where the values ARE needed for all types. So the nulling has to happen at the GetObjects call site, not inside the shared helper.
We could extract this into a separate helper method called after PopulateTableInfoFromTypeName for clarity, but moving the logic inside would change behavior for the GetColumns path.
There was a problem hiding this comment.
I still wanna revisit this.
I think we should fix this function to only return scale for decimal there => Currently it seems the FlatColumnsResultBuilder return scale for non-decimal fields which seems wrong?
The columnSize seems is already correctly handled.
There is not any usage for the GetObjects call now, so I would like to start correct. The current Thrift GetObjects has NO Databricks customization so it's a very very basic one, I would not mind diverge from that
There was a problem hiding this comment.
Fixed in 83a9750. Changed GetDecimalDigitsDefault to return null for all non-DECIMAL/NUMERIC types (was returning 0/7/15/6). Now PopulateTableInfoFromTypeName sets Scale to null from the start for non-DECIMAL types, so the post-processing only needs to null out Precision for non-DECIMAL/CHAR types. This also fixes the FlatColumnsResultBuilder (GetColumns) path which was incorrectly returning Scale values for non-DECIMAL columns.
There was a problem hiding this comment.
Further update in c9847c0 — removed the post-processing entirely. GetColumnSizeDefault and GetDecimalDigitsDefault values are now kept for all types in GetObjects, matching GetColumns/GetColumnsExtended. This provides consistent Precision/Scale values across all metadata APIs rather than nulling them for GetObjects only.
| /// resolves to the session default catalog (SEA SQL requires an explicit | ||
| /// catalog for SHOW commands when not querying all catalogs). | ||
| /// </summary> | ||
| private string? EffectiveCatalog |
There was a problem hiding this comment.
I remember you already have a similar function like this?
There was a problem hiding this comment.
Yes — ResolveEffectiveCatalog on the connection (line 793) still exists and is used by GetTableSchema. This new EffectiveCatalog property is on the statement and replaced the old one-liner _connection.ResolveEffectiveCatalog(_metadataCatalogName).
The reason we can't reuse ResolveEffectiveCatalog is that it does normalized ?? _catalog (always falls back to connection default). That's wrong when the user explicitly SetOption(CatalogName, "SPARK") — that should mean "all catalogs" (null), not fall back to default.
The statement's EffectiveCatalog has different semantics:
enableMultipleCatalogSupport=true: passes null through (all catalogs) when SPARK is setenableMultipleCatalogSupport=false: resolves null to session default viaGetSessionDefaultCatalog()
The connection's ResolveEffectiveCatalog is still correct for GetTableSchema where you always need a concrete catalog (you can't do SHOW COLUMNS IN ALL CATALOGS).
There was a problem hiding this comment.
Is it just SHOW COLUMNS IN ALL CATALOGS not permitted?
What about SHOW TABLES IN ALL CATALOGS which is called from the statement level?
There was a problem hiding this comment.
We support SHOW TABLES IN ALL CATALOGS. For SHOW COLUMNS, the PR is merged in runtime but not yet released
There was a problem hiding this comment.
OK so when that is released we can change the connection level ResolveEffectiveCatalog to use this one? Let's add a comment if so.
There was a problem hiding this comment.
Good point. Added a TODO in f7694ed. Once the backend supports SHOW COLUMNS IN ALL CATALOGS, we can consolidate. Right now the connection's ResolveEffectiveCatalog is only used by GetTableSchema (which always needs a concrete catalog), so it still needs the fallback. The PopulateColumnInfoAsync path now uses ExecuteShowColumnsAsync which handles null catalog via the iterate-all-catalogs fallback.
eric-wang-1990
left a comment
There was a problem hiding this comment.
Review focused on Thrift/SEA parity.
|
** — null pattern is intentional** passes |
|
Observation: GetSessionDefaultCatalog() issues a SQL round-trip on every metadata call GetSessionDefaultCatalog() calls GetCurrentCatalog() which executes SELECT CURRENT_CATALOG() every time. Now that it is explicitly wired from the statement layer (EffectiveCatalog), it fires on every GetObjects/GetTableSchema call when enableMultipleCatalogSupport=false. Not a regression introduced here, but worth a follow-up to lazily cache the value (reset on SetOption). |
|
Observation: GetSessionDefaultCatalog() issues a SQL round-trip on every metadata call GetSessionDefaultCatalog() calls GetCurrentCatalog() which runs SELECT CURRENT_CATALOG() every time it is invoked. Now that it is explicitly wired from the statement layer (EffectiveCatalog property), it will fire on every GetObjects / GetTableSchema call when enableMultipleCatalogSupport=false. This is not a regression introduced here, but worth a follow-up: lazily cache the result on first call and reset the cache on SetOption. |
…rrentCatalog Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
GetDecimalDigitsDefault now only returns a value for DECIMAL/NUMERIC. All other types return null, so Scale is correctly null from the start in PopulateTableInfoFromTypeName. This removes the need for post-processing Scale null-setting in PopulateColumnInfoAsync. Simplified the post-processing to only null out Precision for types that don't need it (non-DECIMAL/CHAR). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
83a9750 to
da40538
Compare
SHOW COLUMNS IN ALL CATALOGS is not yet supported by the backend. When catalog is null, iterate over all catalogs via SHOW CATALOGS and call SHOW COLUMNS IN `catalog` for each, merging results. Catalogs that return permission errors are silently skipped. Both GetColumnsAsync (statement) and PopulateColumnInfoAsync (connection/GetObjects) use the shared ExecuteShowColumnsAsync helper. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…umnInfoAsync Keep GetColumnSizeDefault and GetDecimalDigitsDefault values for all types in GetObjects, matching GetColumns/GetColumnsExtended behavior. This diverges from Thrift's basic GetObjects (which nulls Precision/Scale for non-DECIMAL/CHAR types) but provides more useful metadata and consistent values across all metadata APIs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…Catalog Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… in GetTableSchema GetTableSchema now uses ExecuteShowColumnsAsync with HandleSparkCatalog, matching Thrift behavior which passes catalog as-is to the server (null = all catalogs, SPARK = null). Removed ResolveEffectiveCatalog since it's no longer needed — all catalog resolution now uses either: - Statement's EffectiveCatalog (for metadata commands) - HandleSparkCatalog + ExecuteShowColumnsAsync (for GetTableSchema) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reverts the ?? -1 change back to ?? 0 for both ExecuteQuery and ExecuteUpdate. SEA API returns 0 for DDL, not -1. Keeping original SEA behavior. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Fixes multiple SEA (Statement Execution API) driver issues discovered during E2E test parity testing between Thrift and REST protocols.
Error Handling
AdbcStatusCode.Unauthorized, 404 →NotFound, 400 →InvalidArgumentinStatementExecutionClient.EnsureSuccessStatusCodeAsyncDatabricksExceptionduringStatementExecutionConnection.OpenAsyncCatalog Handling
HandleSparkCatalog(SPARK → null) at SEA connection setup, matching Thrift behaviorenableMultipleCatalogSupport=true, pass null through to mean "all catalogs" instead of falling back to connection default. Verified via comparator across all 24 catalog/flag scenariosGetCatalogsAsync— Thrift GetCatalogs RPC has no catalog filter(databaseName, catalog)not(catalog, databaseName)— fixed in bothStatementExecutionStatementandStatementExecutionConnectionMetadata Fixes
GetObjectsfor case-insensitive matching (matchesDatabricksConnection.GetObjectsPatternsRequireLowerCase = true)_metadataCatalogNamefrom connection default in constructor, matching Thrift'sDatabricksStatementconstructor patternStatement Options
AdbcOptions.Telemetry.TraceParentin bothStatementExecutionStatement.SetOptionandStatementExecutionConnection.SetOptionExecuteUpdaterow count to -1 (unknown) when manifest has no count, matching Thrift DDL behaviorTest Plan
🤖 Generated with Claude Code