Skip to content

BigQueryDataTransferConfig datasetRef required but not by the API / UI #3772

Closed
@dionborsboom

Description

Checklist

Bug Description

The datasetRef field in bigquerydatatransfer.cnrm.cloud.google.com/v1beta1 BigQueryDataTransferConfig is a required field in config connector. However, its possible to create a scheduled_query data transfer configuration that does not have a destination_dataset_id via the API, the gcp console and via Terraform: https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/bigquery_data_transfer_config#destination_dataset_id-1

We currently have terraform configurations we can't really move to KCC because the datasetRef is a required field, but our users don't use destination_dataset_id in some of their scripts. Mostly because their scheduled queries use multiple statements that create data / tables in multiple datasets.

Additional Diagnostic Information

Leaving the destination dataset empty is possible via the API / GUI

Image

Kubernetes Cluster Version

v1.30.8-gke.1051000

Config Connector Version

1.128.0

Config Connector Mode

namespaced mode (default)

Log Output

The BigQueryDataTransferConfig "tst-script" is invalid:

  • spec.datasetRef: Required value
  • : Invalid value: "null": some validation rules were not checked because the object was invalid; correct the existing errors to complete validation

Steps to reproduce the issue

Execute the yaml below to see that the datasetRef is required in KCC.

In the BQ UI, write a query, like

DECLARE x INT64 DEFAULT 0;
LOOP
  SET x = x + 1;
  IF x >= 10 THEN
    LEAVE;
  END IF;
  INSERT myproject.mydataset1.test (column1)
  VALUES(x);
  INSERT myproject.mydataset2.test (column1)
  VALUES(x);
END LOOP;

And then click schedule -> create new schedule. The Destination for query results configuration is optional.

YAML snippets

---
apiVersion: bigquerydatatransfer.cnrm.cloud.google.com/v1beta1
kind: BigQueryDataTransferConfig
metadata:
  annotations:
    cnrm.cloud.google.com/management-conflict-prevention-policy: none
  name: tst-script
  namespace: myproject
spec:
  dataSourceID: scheduled_query
  # datasetRef: # this is required, even though I reference 2 datasets in the script. Its optional in the BQ GUI and Terraform
  #   external: ""
  displayName: tst_script
  location: europe
  params:
    query: |-
      DECLARE x INT64 DEFAULT 0;
      LOOP
        SET x = x + 1;
        IF x >= 10 THEN
          LEAVE;
        END IF;
        INSERT myproject.mydataset1.test (column1)
        VALUES(x);
        INSERT myproject.mydataset2.test (column1)
        VALUES(x);
      END LOOP;
  projectRef:
    name: myproject
    namespace: myproject
  schedule: every 2 hours
  serviceAccountRef:
    name: serviceaccount
    namespace: myproject

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions