Skip to content

dbx does not use credential passthrough #864

@mathurk1

Description

@mathurk1

Expected Behavior

I am working with Azure Databricks. I have a cluster with credential passthrough which allows me to read data stored in ADLS gen2 using my own id. I can simply log into databricks workspace, attach a notebook to the cluster and query the delta tables from ADLS gen2 without any setup.

I would expect that when I submit dbx execute --cluster-id cluster123 --job jobABC to the same cluster, it should be able to read those datasets from ADLS gen2 using my ID.

Thanks!

Current Behavior

Currently, the job fails when I dbx execute a job to the cluster with the following error:

Py4JJavaError: An error occurred while calling o469.load.
: com.databricks.backend.daemon.data.client.adl.AzureCredentialNotFoundException: Could not find ADLS Gen2 Token
        at com.databricks.backend.daemon.data.client.adl.AdlGen2UpgradeCredentialContextTokenProvider.$anonfun$getToken$1(AdlGen2UpgradeCredentialContextTokenProvider.scala:37)
        at scala.Option.getOrElse(Option.scala:189)
        at com.databricks.backend.daemon.data.client.adl.AdlGen2UpgradeCredentialContextTokenProvider.getToken(AdlGen2UpgradeCredentialContextTokenProvider.scala:31)
        at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.services.AbfsClient.getAccessToken(AbfsClient.java:1371)
        at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.executeHttpOperation(AbfsRestOperation.java:306)
        at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.completeExecute(AbfsRestOperation.java:238)
        at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.lambda$execute$0(AbfsRestOperation.java:211)
        at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDurationOfInvocation(IOStatisticsBinding.java:464)
        at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.execute(AbfsRestOperation.java:209)
        at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.services.AbfsClient.getAclStatus(AbfsClient.java:1213)
        at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.services.AbfsClient.getAclStatus(AbfsClient.java:1194)
        at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.getIsNamespaceEnabled(AzureBlobFileSystemStore.java:437)
        at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.getFileStatus(AzureBlobFileSystemStore.java:1107)
        at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.getFileStatus(AzureBlobFileSystem.java:901)
        at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.getFileStatus(AzureBlobFileSystem.java:891)

From my understanding, it is expecting a service principal or storeage keys to be configured

Steps to Reproduce (for bugs)

  1. clone charming aurora repo - https://github.com/gstaubli/dbx-charming-aurora
  2. setup dbx configure --token to setup link with databricks workspace
  3. add a new job to the conf/deployment.yml file:
      - name: "my-test-job"
        spark_python_task:
          python_file: "file://charming_aurora/tasks/sample_etl_task.py"
          parameters: [ "--conf-file", "file:fuse://conf/tasks/sample_etl_config.yml" ]
  1. update the sample etl task to read a adls delta table - https://github.com/gstaubli/dbx-charming-aurora/blob/main/charming_aurora/tasks/sample_etl_task.py
    def _write_data(self):
        df = (
            self.spark.read.format("delta")
            .load(
                f"abfss://[email protected]/path/to/table"
            )
            .filter(f.col("date") == "2024-01-01")
        )
        print(df.count())
  1. submit job - dbx execute --cluster-id=cluster-id-with-credential-passthrough --job my-test-job

Context

I want to specifically "dbx execute" to my interactive cluster and not create a job cluster.

Your Environment

  • dbx version used: 0.8.18
  • Databricks Runtime version: 14.3 LTS (includes Apache Spark 3.5.0, Scala 2.12)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions