Describe the bug
GCP BigQuery is a client based API for the purpose of determining the billing project. When DVT creates a BQ connection, DVT explicitly sets the billing project, resulting in incorrect behavior.
Specifically, the following behavior is observed BigQuery Connections
- When a user account is used for ADC (
gcloud auth application-default login), the command outputs Quota project "xxx" was added to ADC which can be used by Google client libraries for billing and quota. DVT's BQ connection does not use this project and forces the billing project to be the project where the data is stored. The user can specify a different billing project with the --client-project-id parameter. This is not standard.
- When a service account is used for ADC (
gcloud auth application default login --impersonate-service-account <svc-account>), DVT's BQ connection without a --client-project-id works. A connection which uses --client-project-id fails.
The use of --client-project-id for billing project is confusing. The term--billing_project is clearer.
What version of DVT are you using?
Ex: 8.5.2
What type of connections are you using for source and target?
BigQuery for Source and Target
Steps to reproduce the behavior
The most recent BQ queries in a project can be listed by using the command bq ls -j -a -n 4 --project_id=xxx. This command can be used to verify the project where the BQ query is being run.
- Set up two projects, one data-project containing the dataset and an exec-project where BQ Queries can be run. Create two service accounts, one in each project. Give the service accounts read access to the BQ dataset (roles/bigquery.dataViewer) Give the service accounts BQ job roles (roles/bigquery.jobUser and roles/bigquery.readSessionUser) role on both projects. The user has these permissions and the roles roles/iam.serviceAccountTokenCreator on the service account (to impersonate service account).
- Create 2 BQ Connections bq_data, just the name of data_project and bq_data_exec specifying the data_project and the exec_project as client_project_id.
- User Credentials
- a - run
gcloud config set project exec-project # sets billing project to exec-project
- b - run
gcloud auth application-default login # executes further commands as the user. Command returns with the message Quota project "exec-project" was added to ADC which can be used by Google client libraries
- c - execute
data-validation validate column -sc bq_data -tc bq_data -tbls=pso_data_validator.dvt_core_types
- d - Check which project is used for this validation using the bq command mentioned earlier.
- e Confirm that DVT queries are run in the data project. DVT ignores the billing project set in the ADC. This is not normal GCP client behavior. DVT queries can be run in a different project by using
--client-project-id argument while setting up the connection
- Service Accounts
- a run
gcloud auth application-default --impersonate-service-account <exec-project-service-account> login # executes further commands as the service account in the exec project
- b run
data-validation validate column -sc bq_data -tc bq_data -tbls=pso_data_validator.dvt_core_types. DVT fails with a Access Denied: Project data-project: User does not have bigquery.jobs.create permission in project data-project.
- c . run
data-validation validate column -sc bq_data_exec-project -tc bq_data_exec-project -tbls=pso_data_validator.dvt_core_types DVT fails with Caller does not have required permission to use project dataproc-templates. Grant the caller the roles/serviceusage.serviceUsageConsumer role
- d run
gcloud auth application-default --impersonate-service-account <data-project-service-account> login # executes further commands as the service account in the data project
- e run
data-validation validate column -sc bq_data -tc bq_data -tbls=pso_data_validator.dvt_core_types. DVT succeeds.
- f run
data-validation validate column -sc bq_data_exec -tc bq_data_exec -tbls=pso_data_validator.dvt_core_types, where bq2 has the client_project_id set. DVT fails with Caller does not have required permission to use project exec-project. Grant the caller the roles/serviceusage.serviceUsageConsumer role
Expected behavior
For user accounts
- DVT's BQ connection should use the project output by the ADC credential for billing. DVT should discourage the use of
billing-project for user accounts because setting it via gcloud is a standard approach.
For service accounts
- When the connection does not specify a
--billing-project - DVT should use the project in which the service account is located. When a connection specifies a --billing-project that project should be used for billing.
Additional context
For user accounts invoking BigQuery client using client = bigquery.Client() uses the billing project specified in ADC credentials . For service accounts, the project which the service account belongs to is used. Invoking the BigQuery client using client = bigquery.Client(project_id="exec-project"), ensures that the project is billed to the exec-project.
Describe the bug
GCP BigQuery is a client based API for the purpose of determining the billing project. When DVT creates a BQ connection, DVT explicitly sets the billing project, resulting in incorrect behavior.
Specifically, the following behavior is observed BigQuery Connections
gcloud auth application-default login), the command outputsQuota project "xxx" was added to ADC which can be used by Google client libraries for billing and quota. DVT's BQ connection does not use this project and forces the billing project to be the project where the data is stored. The user can specify a different billing project with the--client-project-idparameter. This is not standard.gcloud auth application default login --impersonate-service-account <svc-account>), DVT's BQ connection without a--client-project-idworks. A connection which uses--client-project-idfails.The use of
--client-project-idfor billing project is confusing. The term--billing_projectis clearer.What version of DVT are you using?
Ex: 8.5.2
What type of connections are you using for source and target?
BigQuery for Source and Target
Steps to reproduce the behavior
The most recent BQ queries in a project can be listed by using the command
bq ls -j -a -n 4 --project_id=xxx. This command can be used to verify the project where the BQ query is being run.gcloud config set project exec-project# sets billing project to exec-projectgcloud auth application-default login# executes further commands as the user. Command returns with the messageQuota project "exec-project" was added to ADC which can be used by Google client librariesdata-validation validate column -sc bq_data -tc bq_data -tbls=pso_data_validator.dvt_core_types--client-project-idargument while setting up the connectiongcloud auth application-default --impersonate-service-account <exec-project-service-account> login# executes further commands as the service account in the exec projectdata-validation validate column -sc bq_data -tc bq_data -tbls=pso_data_validator.dvt_core_types. DVT fails with aAccess Denied: Project data-project: User does not have bigquery.jobs.create permission in project data-project.data-validation validate column -sc bq_data_exec-project -tc bq_data_exec-project -tbls=pso_data_validator.dvt_core_typesDVT fails withCaller does not have required permission to use project dataproc-templates. Grant the caller the roles/serviceusage.serviceUsageConsumer rolegcloud auth application-default --impersonate-service-account <data-project-service-account> login# executes further commands as the service account in the data projectdata-validation validate column -sc bq_data -tc bq_data -tbls=pso_data_validator.dvt_core_types. DVT succeeds.data-validation validate column -sc bq_data_exec -tc bq_data_exec -tbls=pso_data_validator.dvt_core_types, where bq2 has the client_project_id set. DVT fails withCaller does not have required permission to use project exec-project. Grant the caller the roles/serviceusage.serviceUsageConsumer roleExpected behavior
For user accounts
billing-projectfor user accounts because setting it viagcloudis a standard approach.For service accounts
--billing-project- DVT should use the project in which the service account is located. When a connection specifies a--billing-projectthat project should be used for billing.Additional context
For user accounts invoking BigQuery client using
client = bigquery.Client()uses the billing project specified in ADC credentials . For service accounts, the project which the service account belongs to is used. Invoking the BigQuery client usingclient = bigquery.Client(project_id="exec-project"), ensures that the project is billed to the exec-project.