Skip to content

DVT 's BQ Connection does not use the correct billing project. #1710

@sundar-mudupalli-work

Description

@sundar-mudupalli-work

Describe the bug
GCP BigQuery is a client based API for the purpose of determining the billing project. When DVT creates a BQ connection, DVT explicitly sets the billing project, resulting in incorrect behavior.
Specifically, the following behavior is observed BigQuery Connections

  1. When a user account is used for ADC (gcloud auth application-default login), the command outputs Quota project "xxx" was added to ADC which can be used by Google client libraries for billing and quota. DVT's BQ connection does not use this project and forces the billing project to be the project where the data is stored. The user can specify a different billing project with the --client-project-id parameter. This is not standard.
  2. When a service account is used for ADC (gcloud auth application default login --impersonate-service-account <svc-account>), DVT's BQ connection without a --client-project-id works. A connection which uses --client-project-id fails.
    The use of --client-project-id for billing project is confusing. The term--billing_project is clearer.
    What version of DVT are you using?
    Ex: 8.5.2

What type of connections are you using for source and target?
BigQuery for Source and Target

Steps to reproduce the behavior
The most recent BQ queries in a project can be listed by using the command bq ls -j -a -n 4 --project_id=xxx. This command can be used to verify the project where the BQ query is being run.

  1. Set up two projects, one data-project containing the dataset and an exec-project where BQ Queries can be run. Create two service accounts, one in each project. Give the service accounts read access to the BQ dataset (roles/bigquery.dataViewer) Give the service accounts BQ job roles (roles/bigquery.jobUser and roles/bigquery.readSessionUser) role on both projects. The user has these permissions and the roles roles/iam.serviceAccountTokenCreator on the service account (to impersonate service account).
  2. Create 2 BQ Connections bq_data, just the name of data_project and bq_data_exec specifying the data_project and the exec_project as client_project_id.
  3. User Credentials
    • a - run gcloud config set project exec-project # sets billing project to exec-project
    • b - run gcloud auth application-default login # executes further commands as the user. Command returns with the message Quota project "exec-project" was added to ADC which can be used by Google client libraries
    • c - execute data-validation validate column -sc bq_data -tc bq_data -tbls=pso_data_validator.dvt_core_types
    • d - Check which project is used for this validation using the bq command mentioned earlier.
    • e Confirm that DVT queries are run in the data project. DVT ignores the billing project set in the ADC. This is not normal GCP client behavior. DVT queries can be run in a different project by using --client-project-id argument while setting up the connection
  4. Service Accounts
    • a run gcloud auth application-default --impersonate-service-account <exec-project-service-account> login # executes further commands as the service account in the exec project
    • b run data-validation validate column -sc bq_data -tc bq_data -tbls=pso_data_validator.dvt_core_types. DVT fails with a Access Denied: Project data-project: User does not have bigquery.jobs.create permission in project data-project.
    • c . run data-validation validate column -sc bq_data_exec-project -tc bq_data_exec-project -tbls=pso_data_validator.dvt_core_types DVT fails with Caller does not have required permission to use project dataproc-templates. Grant the caller the roles/serviceusage.serviceUsageConsumer role
    • d run gcloud auth application-default --impersonate-service-account <data-project-service-account> login # executes further commands as the service account in the data project
    • e run data-validation validate column -sc bq_data -tc bq_data -tbls=pso_data_validator.dvt_core_types. DVT succeeds.
    • f run data-validation validate column -sc bq_data_exec -tc bq_data_exec -tbls=pso_data_validator.dvt_core_types, where bq2 has the client_project_id set. DVT fails with Caller does not have required permission to use project exec-project. Grant the caller the roles/serviceusage.serviceUsageConsumer role

Expected behavior
For user accounts

  • DVT's BQ connection should use the project output by the ADC credential for billing. DVT should discourage the use of billing-project for user accounts because setting it via gcloud is a standard approach.
    For service accounts
  • When the connection does not specify a --billing-project - DVT should use the project in which the service account is located. When a connection specifies a --billing-project that project should be used for billing.

Additional context
For user accounts invoking BigQuery client using client = bigquery.Client() uses the billing project specified in ADC credentials . For service accounts, the project which the service account belongs to is used. Invoking the BigQuery client using client = bigquery.Client(project_id="exec-project"), ensures that the project is billed to the exec-project.

Metadata

Metadata

Assignees

No one assigned

    Labels

    type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions