feat(gcs): adds WIF Workload Identification Federation support#16002
Conversation
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
…roject/datahub into gcs-wif-support-update
|
@sgomezvillamor
Restructured the GCS docs to follow the new _pre.md / _post.md / _recipe.yml format per docs/sources/AGENTS.md. This also resolved the merge conflicts with master. Added WIF use-case examples to gcs_pre.md covering the scenarios you suggested (GKE workloads, cross-cloud auth, credential rotation for compliance).
Moved the WIF implementation out of gcs_source.py into a new shared module: source/common/gcp_wif_config.py. This exposes two reusable exports: GCPWIFConfig — a Pydantic config mixin with the three input options (file path, JSON dict, JSON string) and format validators. BigQuery, Dataplex, VertexAI can inherit from this directly when they need WIF support.
These were missing from the WIF branch of create_equivalent_s3_config. Fixed — both DataLakeSourceConfig instantiation paths (HMAC and WIF) now pass all three fields.
Eliminated temp file usage entirely. Switched from load_credentials_from_file() to google.auth.load_credentials_from_dict(), which loads WIF credentials directly from a Python dict in memory — no file is ever written. This applies to all three input options (file path is read once into a dict; the dict and JSON string options never touch disk at all). |
|
I don't know why the vertica plugin in failing |
| if not any( | ||
| [ | ||
| wif_config.gcp_wif_configuration, | ||
| wif_config.gcp_wif_configuration_json, | ||
| wif_config.gcp_wif_configuration_json_string, | ||
| ] | ||
| ): | ||
| raise ValueError("No valid WIF configuration provided") |
There was a problem hiding this comment.
this could be moved up as a pydantic validation
you could also add the validation that only one is set
| "Using Workload Identity Federation configuration from JSON string" | ||
| ) | ||
|
|
||
| credentials, project_id = load_credentials_from_dict(wif_config_dict) |
| if len(provided_options) == 0: | ||
| raise ValueError( | ||
| "One of gcp_wif_configuration (file path), gcp_wif_configuration_json (JSON content), " | ||
| "or gcp_wif_configuration_json_string (JSON string) is required when auth_type is 'workload_identity_federation'" | ||
| ) | ||
| elif len(provided_options) > 1: | ||
| raise ValueError( | ||
| "Cannot specify multiple WIF configuration options. Use only one of: " | ||
| "gcp_wif_configuration, gcp_wif_configuration_json, or gcp_wif_configuration_json_string." | ||
| ) |
There was a problem hiding this comment.
oh.... I seee, the validation I was suggesting is here already 👍
| # Pydantic v2 models freeze field assignment after __init__; object.__setattr__ | ||
| # bypasses this to attach runtime-only state that is not part of the config schema. | ||
| object.__setattr__( | ||
| aws_config, "_gcs_oauth_credentials", self._wif_credentials | ||
| ) | ||
| object.__setattr__( | ||
| aws_config, "_gcs_oauth_project_id", self._wif_project_id | ||
| ) |
There was a problem hiding this comment.
this looks hacky, and I haven't found any reference to these fields in internet or ALL github 🤔
how much is this needed?
what's the error you get if not set?
we need more contextual information in the comment in case we ever need to revisit this
sgomezvillamor
left a comment
There was a problem hiding this comment.
Overall LGTM
I'm a little bit concerned about the object.__setattr__ calls for _gcs_oauth_credentials and _gcs_oauth_project_id
Approving to unblock but please let's provide more info or context on that.
It was fixed with 5b1a316, so update branch to fix |
…ptions in GCPWIFConfig; refactor GCSOAuthAwsConnectionConfig to use private attributes for credentials
Updated this to be pydantic v2 way to declare instance-level runtime state that isn't part of the config schema. |
- feat(dataplex): add support for more Dataplex entry groups and hierarchy mapping (#16723) - feat(dagster): Emit StatusClass aspect during asset ingestion to handle soft deleted assets (#16809) - fix(ui): enable lint rule on colors for diff files (#16813) - feat(gcs): adds WIF Workload Identification Federation support (#16002) - fix(ingest/dep): CVE-2024-27459 (#16822)
This PR adds in support for Google' Workload Identity Federation as an alternative to the existing HMAC authentication for connecting to GCP. It also adds unit tests and updates the documentation to reflect the additional config options.