-
Notifications
You must be signed in to change notification settings - Fork 107
Description
When using Template to connect through JDBC, for example with https://cloud.google.com/dataproc-serverless/docs/templates/storage-to-jdbc, JDBC_CONNECTION_URL would require user to specify password=JDBC_PASSWORD which is in plaintext.
This is a security concern.
To solve the plaintext password issue, in non-Template serverless Spark on GCP Dataproc, this is done via https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/CredentialProviderAPI.html by hiding the password key in a jceks file, and read in the jceks file using "spark.hadoop.hadoop.security.credential.provider.path=[JCEKS_FILE_PATH]" in the --properties setting.
Note that you also need to specify the spark.hadoop.javax.jdo.option.ConnectionURL=[JDBC_CONNECTION_URL] properties setting.
However, for Templated serverless Spark on GCP Dataproc, as illustrated in https://cloud.google.com/dataproc-serverless/docs/templates/storage-to-jdbc, --templateProperty is used instead of --properties. Furthermore, gcs.jdbc.output.url=[JDBC_CONNECTION_URL] is a required parameter, but this cannot be used together with spark.hadoop.javax.jdo.option.ConnectionURL=[JDBC_CONNECTION_URL].
In other words, user cannot mask password in JDBC connection when using Templated serverless Spark on GCP Dataproc, and I ask for a Feature Request to make this possible.