-
Notifications
You must be signed in to change notification settings - Fork 127
Open
Description
Expected Behavior
I am working with Azure Databricks. I have a cluster with credential passthrough which allows me to read data stored in ADLS gen2 using my own id. I can simply log into databricks workspace, attach a notebook to the cluster and query the delta tables from ADLS gen2 without any setup.
I would expect that when I submit dbx execute --cluster-id cluster123 --job jobABC to the same cluster, it should be able to read those datasets from ADLS gen2 using my ID.
Thanks!
Current Behavior
Currently, the job fails when I dbx execute a job to the cluster with the following error:
Py4JJavaError: An error occurred while calling o469.load.
: com.databricks.backend.daemon.data.client.adl.AzureCredentialNotFoundException: Could not find ADLS Gen2 Token
at com.databricks.backend.daemon.data.client.adl.AdlGen2UpgradeCredentialContextTokenProvider.$anonfun$getToken$1(AdlGen2UpgradeCredentialContextTokenProvider.scala:37)
at scala.Option.getOrElse(Option.scala:189)
at com.databricks.backend.daemon.data.client.adl.AdlGen2UpgradeCredentialContextTokenProvider.getToken(AdlGen2UpgradeCredentialContextTokenProvider.scala:31)
at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.services.AbfsClient.getAccessToken(AbfsClient.java:1371)
at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.executeHttpOperation(AbfsRestOperation.java:306)
at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.completeExecute(AbfsRestOperation.java:238)
at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.lambda$execute$0(AbfsRestOperation.java:211)
at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDurationOfInvocation(IOStatisticsBinding.java:464)
at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.execute(AbfsRestOperation.java:209)
at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.services.AbfsClient.getAclStatus(AbfsClient.java:1213)
at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.services.AbfsClient.getAclStatus(AbfsClient.java:1194)
at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.getIsNamespaceEnabled(AzureBlobFileSystemStore.java:437)
at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.getFileStatus(AzureBlobFileSystemStore.java:1107)
at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.getFileStatus(AzureBlobFileSystem.java:901)
at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.getFileStatus(AzureBlobFileSystem.java:891)
From my understanding, it is expecting a service principal or storeage keys to be configured
Steps to Reproduce (for bugs)
- clone charming aurora repo - https://github.com/gstaubli/dbx-charming-aurora
- setup
dbx configure --tokento setup link with databricks workspace - add a new job to the
conf/deployment.ymlfile:
- name: "my-test-job"
spark_python_task:
python_file: "file://charming_aurora/tasks/sample_etl_task.py"
parameters: [ "--conf-file", "file:fuse://conf/tasks/sample_etl_config.yml" ]
- update the sample etl task to read a adls delta table - https://github.com/gstaubli/dbx-charming-aurora/blob/main/charming_aurora/tasks/sample_etl_task.py
def _write_data(self):
df = (
self.spark.read.format("delta")
.load(
f"abfss://[email protected]/path/to/table"
)
.filter(f.col("date") == "2024-01-01")
)
print(df.count())
- submit job -
dbx execute --cluster-id=cluster-id-with-credential-passthrough --job my-test-job
Context
I want to specifically "dbx execute" to my interactive cluster and not create a job cluster.
Your Environment
- dbx version used:
0.8.18 - Databricks Runtime version:
14.3 LTS (includes Apache Spark 3.5.0, Scala 2.12)
Metadata
Metadata
Assignees
Labels
No labels