Skip to content

Identifier case-folding: data dictionary and metadata queries bind raw values without dialect-aware normalization #477

@cofin

Description

@cofin

Summary

Every adapter's resolve_schema and every data-dictionary metadata query binds an identifier-shaped value (:schema_name, :table_name, ...) verbatim. None of them apply dialect-aware case folding, so callers must already know each engine's normalization rules to get matching rows:

  • Oracle stores unquoted identifiers uppercase. get_tables(driver, schema="myapp") looks up 'myapp' and returns nothing — the real owner is MYAPP.
  • PostgreSQL stores unquoted identifiers lowercase. get_tables(driver, schema="MyApp") looks up 'MyApp' and returns nothing — the real schema is myapp.

The same trap applies to any code path that binds an identifier into SQL: has_schema, schema-validation paths, and any future introspection/lookup APIs.

Affected files

resolve_schema (all do return schema verbatim):

  • sqlspec/driver/_common.py:565
  • sqlspec/adapters/asyncpg/data_dictionary.py:23
  • sqlspec/adapters/aiosqlite/data_dictionary.py:33
  • sqlspec/adapters/sqlite/data_dictionary.py:28
  • sqlspec/adapters/oracledb/data_dictionary.py:92, 256
  • sqlspec/adapters/mssql_python/data_dictionary.py:72
  • sqlspec/adapters/arrow_odbc/data_dictionary.py:44

Metadata queries that bind :schema_name / :table_name verbatim:

  • sqlspec/data_dictionary/sql/oracle/{tables,columns,indexes,foreign_keys}.sql
  • sqlspec/data_dictionary/sql/postgres/{tables,columns,indexes,foreign_keys}.sql
  • sqlspec/data_dictionary/sql/sqlite/* (no schemas, mostly inert)
  • sqlspec/data_dictionary/sql/cockroachdb/*
  • sqlspec/data_dictionary/sql/bigquery/*
  • sqlspec/data_dictionary/sql/spanner/*
  • sqlspec/data_dictionary/sql/mssql/* (if present)

Reproduction (Oracle)

# User MYAPP exists (created unquoted, so Oracle stored "MYAPP").
result = await driver.data_dictionary.get_tables(driver, schema="myapp")
# result == [] because the query binds 'myapp' literally.

Same failure mode on PostgreSQL with the opposite case-folding direction.

Proposed approach

Per-dialect identifier normalization helpers (e.g. in sqlspec/utils/text.py alongside quote_identifier), used consistently at every site that binds an identifier:

def normalize_oracle_identifier(name: str) -> str: ...   # uppercase unquoted, preserve quoted
def normalize_postgres_identifier(name: str) -> str: ... # lowercase unquoted, preserve quoted
def normalize_sqlserver_identifier(name: str) -> str: ... # preserves case (collation-sensitive)
# ...

resolve_schema in each adapter calls its dialect's normalizer; metadata queries continue to bind the (now normalized) value.

The migration has_schema / set_migration_session_schema hooks currently expect callers to pass the literal stored form (see PR #471 docs). That contract becomes "we normalize for you" once this is in place.

Scope

  • New per-dialect normalizer helpers.
  • Updated resolve_schema implementations and the base in sqlspec/driver/_common.py.
  • Audit of every bind site for identifier-shaped values across data dictionary, migration hooks, and any introspection helpers.
  • Tests covering uppercase/lowercase/mixed-case inputs against each dialect.

Tracked alongside the larger identifier-handling PRD (link to be added once filed).

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions