Skip to content

Conversation

Ewan-Keith
Copy link
Contributor

@Ewan-Keith Ewan-Keith commented Oct 12, 2025

Adds support for documenting a full databricks catalog (full table ref in dbx nomenclature is catalog.schema.table) by not specifying a schema in the DSN.

This is a non-breaking change, if a schema is specified (as previously required) behaviour is identical, verified this by comparing new outputs against the contents of samples/databricks/ which were generated without this change.

If the schema is not specified then the dbx driver simply documents all tables under the catalog. To avoid ambiguity/name-collisions each table reference is given as schema_name.table_name when ran in this mode. I think this is the same behaviour as the postgres adaptor.

Attached is sample output when ran in this mode (just captures the tpch tables used in the single-schema output along with tables in the databricks information schema). This could be committed to the repo if desired but no other dbs include multiple versions of sample output at the moment so haven't done that yet.

tbls-dbx-multi-schema-output.zip

couple little changes in the implementation to try and align with wider project idioms also included here and there but all minor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant