Skip to content

[Databricks destination] Adding comment and tags for table and columns and applying primary and foreign key constraints in Unity Catalog #2674

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 15 commits into
base: devel
Choose a base branch
from

Conversation

bayees
Copy link

@bayees bayees commented May 23, 2025

Description

Create comments and tags for tables and columns in Unity Catalog tables. Also adding primary key and foreign key constraints using the primary_key and references hints.

I have added the create_indexes to enable primary and foreign key to keep backwards compatible

There is a new Databricks adapter to use the new x-databricks-cluster, x-databricks-table-comment, x-databricks-table-tags, x-databricks-column-comment, x-databricks-column-tags hints.

The table and column comments using description hints if those are available, but overrides those if the hints from the databricks adapter i given.

Related Issues

Additional Context

bayees added 10 commits May 18, 2025 06:16
…ing support for clustering, table comments, and tags.
…and foreign key constraints, and table options. Update SQL generation for table alterations to include comments and tags.
… hints, along with examples for using the databricks_adapter. Enhance clarity on applying hints for resource metadata and constraints.
…s adapter. Improve logging for table options during ALTER TABLE operations and streamline SQL generation for constraints.
…OREIGN KEY constraints. Introduce create_indexes option in DatabricksClientConfiguration and update related classes to handle index creation logic. Add new package-lock.yml for dbt_transform examples.
…r enforcing PRIMARY KEY and FOREIGN KEY constraints on tables.
Copy link

netlify bot commented May 23, 2025

Deploy Preview for dlt-hub-docs ready!

Name Link
🔨 Latest commit 8e45af8
🔍 Latest deploy log https://app.netlify.com/projects/dlt-hub-docs/deploys/6843d0a03e2b3a0008632f6b
😎 Deploy Preview https://deploy-preview-2674--dlt-hub-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@bayees bayees changed the title Exp/add databricks metadata [Databricks destination] Adding comment and tags for table and columns and applying primary and foreign key constraints in Unity Catalog May 23, 2025
@rudolfix
Copy link
Collaborator

rudolfix commented Jun 2, 2025

@bayees this looks good! are you able to fix linting errors and add a basic test? like we do ie. for bigquery adapter. also we can take over this PR as we are allowed to push to your branch. ping us what's your plan

bayees added 2 commits June 2, 2025 16:08
…ents and table comments. Clean up unnecessary whitespace in the code for better readability.
@bayees
Copy link
Author

bayees commented Jun 2, 2025

@rudolfix Linting is done.

I am not sure about the test. Can you point to the specific Bigquery test?

@rudolfix
Copy link
Collaborator

rudolfix commented Jun 5, 2025

Hey! BigQuery tests are in tests/load/bigquery/test_biguery_table_builder.py. example test checking if partition is right,

def test_bigquery_partition_by_integer(
    destination_config: DestinationTestConfiguration,
) -> None:
    pipeline = destination_config.setup_pipeline(f"bigquery_{uniq_id()}", dev_mode=True)

    @dlt.resource(
        columns={"some_int": {"data_type": "bigint", "partition": True, "nullable": False}},
    )
    def demo_resource() -> Iterator[Dict[str, int]]:
        for i in range(10):
            yield {
                "some_int": i,
            }

    @dlt.source(max_table_nesting=0)
    def demo_source() -> DltResource:
        return demo_resource

    pipeline.run(demo_source())

    with pipeline.sql_client() as c:
        with c.execute_query(
            "SELECT EXISTS (SELECT 1 FROM INFORMATION_SCHEMA.PARTITIONS WHERE partition_id IS NOT"
            " NULL);"
        ) as cur:
            has_partitions = cur.fetchone()[0]
            assert isinstance(has_partitions, bool)
            assert has_partitions

note that:

    with pipeline.sql_client() as c:
        databricks = c.native_connection

gives you access to native databricks client

you could place your tests in tests/load/databricks/test_databricks_adapter.py. also we do not need tests that are as comprehensive as BigQuery but a basic check if hints are translated in the right props in Unity Catalog would be cool.

regarding credentials:
you can place them into tests/.dlt/secrets.toml tests will find it. here are ours (without secrets). I assume you have access to the cluster?

[destination.databricks.credentials]
server_hostname = "adb-8001321225760611.11.azuredatabricks.net"
http_path = "/sql/protocolv1/o/8001321225760611/0124-183359-ghw1vo3b"
access_token = "..."
catalog = "dlt_ci"
client_id = "..."
client_secret = "..."

one more thing: we broke our devel :/ so you'd need to merge it into your branch,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Databricks] Add comments and tags hints for table and column
2 participants