Skip to content

fix(csharp): fix SEA protocol parity issues for metadata, errors, and catalog handling#338

Merged
msrathore-db merged 10 commits intomainfrom
fix/sea-driver-fixes
Mar 12, 2026
Merged

fix(csharp): fix SEA protocol parity issues for metadata, errors, and catalog handling#338
msrathore-db merged 10 commits intomainfrom
fix/sea-driver-fixes

Conversation

@msrathore-db
Copy link
Copy Markdown
Collaborator

Summary

Fixes multiple SEA (Statement Execution API) driver issues discovered during E2E test parity testing between Thrift and REST protocols.

Error Handling

  • HTTP status code mapping: Map 401/403 → AdbcStatusCode.Unauthorized, 404 → NotFound, 400 → InvalidArgument in StatementExecutionClient.EnsureSuccessStatusCodeAsync
  • Connection error wrapping: Wrap non-ADBC exceptions (DNS failures, etc.) in DatabricksException during StatementExecutionConnection.OpenAsync

Catalog Handling

  • SPARK catalog alias: Apply HandleSparkCatalog (SPARK → null) at SEA connection setup, matching Thrift behavior
  • EffectiveCatalog rewrite: When SPARK is explicitly requested with enableMultipleCatalogSupport=true, pass null through to mean "all catalogs" instead of falling back to connection default. Verified via comparator across all 24 catalog/flag scenarios
  • GetCatalogs filter removal: Remove catalog pattern filter from GetCatalogsAsync — Thrift GetCatalogs RPC has no catalog filter
  • SHOW SCHEMAS column order fix: Server returns (databaseName, catalog) not (catalog, databaseName) — fixed in both StatementExecutionStatement and StatementExecutionConnection

Metadata Fixes

  • Lowercase GetObjects patterns: Lowercase catalog/schema/table/column patterns in GetObjects for case-insensitive matching (matches DatabricksConnection.GetObjectsPatternsRequireLowerCase = true)
  • Scale/Precision null-setting: Only retain Scale (XdbcDecimalDigits) for DECIMAL/NUMERIC types. CHAR/VARCHAR retain Precision but not Scale, matching Thrift behavior
  • Statement default catalog: Set _metadataCatalogName from connection default in constructor, matching Thrift's DatabricksStatement constructor pattern

Statement Options

  • TraceParent support: Handle AdbcOptions.Telemetry.TraceParent in both StatementExecutionStatement.SetOption and StatementExecutionConnection.SetOption
  • AffectedRows default: Default ExecuteUpdate row count to -1 (unknown) when manifest has no count, matching Thrift DDL behavior

Test Plan

  • All 24 catalog behavior scenarios verified via custom comparator (Thrift vs SEA)
  • Driver builds successfully
  • Full E2E test suite (requires companion test changes in separate PR)

🤖 Generated with Claude Code

msrathore-db and others added 2 commits March 12, 2026 08:32
… catalog handling

Fixes multiple SEA driver issues found during E2E test parity testing
with Thrift protocol:

- Map HTTP 401/403 to AdbcStatusCode.Unauthorized in SEA client
- Wrap non-ADBC exceptions in DatabricksException during OpenAsync
- Handle SPARK catalog alias (SPARK → null) at connection setup,
  matching Thrift's HandleSparkCatalog behavior
- Fix EffectiveCatalog to pass null through (all catalogs) when SPARK
  is explicitly requested, instead of falling back to connection default
- Fix SHOW SCHEMAS IN ALL CATALOGS column order (databaseName, catalog)
  not (catalog, databaseName) as previously assumed
- Remove catalog filter from GetCatalogs (Thrift has no filter either)
- Lowercase GetObjects patterns for case-insensitive matching
- Fix Scale/Precision null-setting: null Scale for CHAR/VARCHAR types
  (only DECIMAL/NUMERIC should retain Scale)
- Add SetOption support for TraceParent on both connection and statement
- Default ExecuteUpdate AffectedRows to -1 when manifest has no count
- Set _metadataCatalogName from connection default in statement constructor

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
&& typeCode != (short)HiveServer2Connection.ColumnTypeId.NVARCHAR
&& typeCode != (short)HiveServer2Connection.ColumnTypeId.LONGVARCHAR
&& typeCode != (short)HiveServer2Connection.ColumnTypeId.LONGNVARCHAR)
bool isDecimalOrNumeric = typeCode == (short)HiveServer2Connection.ColumnTypeId.DECIMAL
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need to set these to null after PopulateTableInfoFromTypeName is already processed?
Can we just set everything right in PopulateTableInfoFromTypeName?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid suggestion. PopulateTableInfoFromTypeName sets non-null values for ALL types (e.g., Scale=0 for BOOLEAN, Scale=7 for FLOAT). But Thrift's SparkConnection.SetPrecisionScaleAndTypeName only sets Precision/Scale for DECIMAL/NUMERIC/CHAR/VARCHAR types — everything else gets null.

The post-processing null-setting was the original design to match Thrift parity. The cleaner fix would be to update PopulateTableInfoFromTypeName itself, but it's in ColumnMetadataHelper which is also used by FlatColumnsResultBuilder (GetColumns path) where the values ARE needed for all types. So the nulling has to happen at the GetObjects call site, not inside the shared helper.

We could extract this into a separate helper method called after PopulateTableInfoFromTypeName for clarity, but moving the logic inside would change behavior for the GetColumns path.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still wanna revisit this.
I think we should fix this function to only return scale for decimal there => Currently it seems the FlatColumnsResultBuilder return scale for non-decimal fields which seems wrong?
The columnSize seems is already correctly handled.

There is not any usage for the GetObjects call now, so I would like to start correct. The current Thrift GetObjects has NO Databricks customization so it's a very very basic one, I would not mind diverge from that

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 83a9750. Changed GetDecimalDigitsDefault to return null for all non-DECIMAL/NUMERIC types (was returning 0/7/15/6). Now PopulateTableInfoFromTypeName sets Scale to null from the start for non-DECIMAL types, so the post-processing only needs to null out Precision for non-DECIMAL/CHAR types. This also fixes the FlatColumnsResultBuilder (GetColumns) path which was incorrectly returning Scale values for non-DECIMAL columns.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Further update in c9847c0 — removed the post-processing entirely. GetColumnSizeDefault and GetDecimalDigitsDefault values are now kept for all types in GetObjects, matching GetColumns/GetColumnsExtended. This provides consistent Precision/Scale values across all metadata APIs rather than nulling them for GetObjects only.

/// resolves to the session default catalog (SEA SQL requires an explicit
/// catalog for SHOW commands when not querying all catalogs).
/// </summary>
private string? EffectiveCatalog
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember you already have a similar function like this?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes — ResolveEffectiveCatalog on the connection (line 793) still exists and is used by GetTableSchema. This new EffectiveCatalog property is on the statement and replaced the old one-liner _connection.ResolveEffectiveCatalog(_metadataCatalogName).

The reason we can't reuse ResolveEffectiveCatalog is that it does normalized ?? _catalog (always falls back to connection default). That's wrong when the user explicitly SetOption(CatalogName, "SPARK") — that should mean "all catalogs" (null), not fall back to default.

The statement's EffectiveCatalog has different semantics:

  • enableMultipleCatalogSupport=true: passes null through (all catalogs) when SPARK is set
  • enableMultipleCatalogSupport=false: resolves null to session default via GetSessionDefaultCatalog()

The connection's ResolveEffectiveCatalog is still correct for GetTableSchema where you always need a concrete catalog (you can't do SHOW COLUMNS IN ALL CATALOGS).

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it just SHOW COLUMNS IN ALL CATALOGS not permitted?
What about SHOW TABLES IN ALL CATALOGS which is called from the statement level?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We support SHOW TABLES IN ALL CATALOGS. For SHOW COLUMNS, the PR is merged in runtime but not yet released

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK so when that is released we can change the connection level ResolveEffectiveCatalog to use this one? Let's add a comment if so.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Added a TODO in f7694ed. Once the backend supports SHOW COLUMNS IN ALL CATALOGS, we can consolidate. Right now the connection's ResolveEffectiveCatalog is only used by GetTableSchema (which always needs a concrete catalog), so it still needs the fallback. The PopulateColumnInfoAsync path now uses ExecuteShowColumnsAsync which handles null catalog via the iterate-all-catalogs fallback.

Copy link
Copy Markdown
Collaborator

@eric-wang-1990 eric-wang-1990 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review focused on Thrift/SEA parity.

@eric-wang-1990
Copy link
Copy Markdown
Collaborator

** — null pattern is intentional**

passes null (not _metadataCatalogName) to ShowCatalogsCommand. This matches Thrift behaviour — SHOW CATALOGS lists everything and filtering is applied by the caller. It's easy to read as a copy-paste bug though; a brief inline comment would help future readers.

@eric-wang-1990
Copy link
Copy Markdown
Collaborator

Observation: GetSessionDefaultCatalog() issues a SQL round-trip on every metadata call

GetSessionDefaultCatalog() calls GetCurrentCatalog() which executes SELECT CURRENT_CATALOG() every time. Now that it is explicitly wired from the statement layer (EffectiveCatalog), it fires on every GetObjects/GetTableSchema call when enableMultipleCatalogSupport=false. Not a regression introduced here, but worth a follow-up to lazily cache the value (reset on SetOption).

@eric-wang-1990
Copy link
Copy Markdown
Collaborator

Observation: GetSessionDefaultCatalog() issues a SQL round-trip on every metadata call

GetSessionDefaultCatalog() calls GetCurrentCatalog() which runs SELECT CURRENT_CATALOG() every time it is invoked. Now that it is explicitly wired from the statement layer (EffectiveCatalog property), it will fire on every GetObjects / GetTableSchema call when enableMultipleCatalogSupport=false. This is not a regression introduced here, but worth a follow-up: lazily cache the result on first call and reset the cache on SetOption.

…rrentCatalog

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
GetDecimalDigitsDefault now only returns a value for DECIMAL/NUMERIC.
All other types return null, so Scale is correctly null from the start
in PopulateTableInfoFromTypeName. This removes the need for
post-processing Scale null-setting in PopulateColumnInfoAsync.

Simplified the post-processing to only null out Precision for types
that don't need it (non-DECIMAL/CHAR).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
SHOW COLUMNS IN ALL CATALOGS is not yet supported by the backend.
When catalog is null, iterate over all catalogs via SHOW CATALOGS
and call SHOW COLUMNS IN `catalog` for each, merging results.
Catalogs that return permission errors are silently skipped.

Both GetColumnsAsync (statement) and PopulateColumnInfoAsync
(connection/GetObjects) use the shared ExecuteShowColumnsAsync helper.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
msrathore-db and others added 2 commits March 12, 2026 13:39
…umnInfoAsync

Keep GetColumnSizeDefault and GetDecimalDigitsDefault values for all
types in GetObjects, matching GetColumns/GetColumnsExtended behavior.
This diverges from Thrift's basic GetObjects (which nulls Precision/Scale
for non-DECIMAL/CHAR types) but provides more useful metadata and
consistent values across all metadata APIs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…Catalog

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… in GetTableSchema

GetTableSchema now uses ExecuteShowColumnsAsync with HandleSparkCatalog,
matching Thrift behavior which passes catalog as-is to the server
(null = all catalogs, SPARK = null).

Removed ResolveEffectiveCatalog since it's no longer needed — all
catalog resolution now uses either:
- Statement's EffectiveCatalog (for metadata commands)
- HandleSparkCatalog + ExecuteShowColumnsAsync (for GetTableSchema)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reverts the ?? -1 change back to ?? 0 for both ExecuteQuery and
ExecuteUpdate. SEA API returns 0 for DDL, not -1. Keeping original
SEA behavior.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@msrathore-db msrathore-db added this pull request to the merge queue Mar 12, 2026
Merged via the queue into main with commit 4a962cc Mar 12, 2026
16 of 17 checks passed
@msrathore-db msrathore-db deleted the fix/sea-driver-fixes branch March 12, 2026 13:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants