Skip to content

BigQueryDataTransferConfig datasetRef required but not by the API / UI #3772

Closed
@dionborsboom

Description

@dionborsboom

Checklist

Bug Description

The datasetRef field in bigquerydatatransfer.cnrm.cloud.google.com/v1beta1 BigQueryDataTransferConfig is a required field in config connector. However, its possible to create a scheduled_query data transfer configuration that does not have a destination_dataset_id via the API, the gcp console and via Terraform: https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/bigquery_data_transfer_config#destination_dataset_id-1

We currently have terraform configurations we can't really move to KCC because the datasetRef is a required field, but our users don't use destination_dataset_id in some of their scripts. Mostly because their scheduled queries use multiple statements that create data / tables in multiple datasets.

Additional Diagnostic Information

Leaving the destination dataset empty is possible via the API / GUI

Image

Kubernetes Cluster Version

v1.30.8-gke.1051000

Config Connector Version

1.128.0

Config Connector Mode

namespaced mode (default)

Log Output

The BigQueryDataTransferConfig "tst-script" is invalid:

  • spec.datasetRef: Required value
  • : Invalid value: "null": some validation rules were not checked because the object was invalid; correct the existing errors to complete validation

Steps to reproduce the issue

Execute the yaml below to see that the datasetRef is required in KCC.

In the BQ UI, write a query, like

DECLARE x INT64 DEFAULT 0;
LOOP
  SET x = x + 1;
  IF x >= 10 THEN
    LEAVE;
  END IF;
  INSERT myproject.mydataset1.test (column1)
  VALUES(x);
  INSERT myproject.mydataset2.test (column1)
  VALUES(x);
END LOOP;

And then click schedule -> create new schedule. The Destination for query results configuration is optional.

YAML snippets

---
apiVersion: bigquerydatatransfer.cnrm.cloud.google.com/v1beta1
kind: BigQueryDataTransferConfig
metadata:
  annotations:
    cnrm.cloud.google.com/management-conflict-prevention-policy: none
  name: tst-script
  namespace: myproject
spec:
  dataSourceID: scheduled_query
  # datasetRef: # this is required, even though I reference 2 datasets in the script. Its optional in the BQ GUI and Terraform
  #   external: ""
  displayName: tst_script
  location: europe
  params:
    query: |-
      DECLARE x INT64 DEFAULT 0;
      LOOP
        SET x = x + 1;
        IF x >= 10 THEN
          LEAVE;
        END IF;
        INSERT myproject.mydataset1.test (column1)
        VALUES(x);
        INSERT myproject.mydataset2.test (column1)
        VALUES(x);
      END LOOP;
  projectRef:
    name: myproject
    namespace: myproject
  schedule: every 2 hours
  serviceAccountRef:
    name: serviceaccount
    namespace: myproject

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions