-
Notifications
You must be signed in to change notification settings - Fork 3.5k
feat(gcs): adds WIF Workload Identification Federation support #16002
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
39 commits
Select commit
Hold shift + click to select a range
e2a0a51
added support for workload identity federation
brock-acryl 1c792af
using google auth library directly
brock-acryl 5421931
changed the method for reading in the WIF creds
bryanprosser-acryl a6e4dd5
Merge branch 'master' into gcs-wif-support-update
bryanprosser-acryl 589ac2a
fix: update pydantic validators to use field_validator and model_vali…
bryanprosser-acryl e47d68a
Merge branch 'master' into gcs-wif-support-update
bryanprosser-acryl 8163634
added some error handling
bryanprosser-acryl 0ccb883
Merge branch 'gcs-wif-support-update' of https://github.com/datahub-p…
bryanprosser-acryl 078ea30
added scope
bryanprosser-acryl 5933bce
Merge branch 'master' into gcs-wif-support-update
bryanprosser-acryl 580c138
added gcs wrapper for WIF
bryanprosser-acryl 82b397f
fixed enum types based on review
bryanprosser-acryl 8e598e6
Merge branch 'master' into gcs-wif-support-update
bryanprosser-acryl 866500a
removed unnecessary comments
bryanprosser-acryl 880adbb
Merge branch 'gcs-wif-support-update' of https://github.com/datahub-p…
bryanprosser-acryl 22943d2
documentation updated to include reference to the WIF in the prerequi…
bryanprosser-acryl a03a4be
Logic added to cleanup the WIF file (if created)
bryanprosser-acryl ae769bb
added WIF unit tests
bryanprosser-acryl 4d8782e
Fixed type issue
bryanprosser-acryl bf8e7d5
Fixed linting issues
bryanprosser-acryl 71e9549
updated markdown based on prettier check
bryanprosser-acryl da5fd7a
Merge branch 'master' into gcs-wif-support-update
bryanprosser-acryl 5b2c55e
Merge branch 'master' into gcs-wif-support-update
bryanprosser-acryl 80feb8d
test commit
bryanprosser-acryl 4320589
fix(ingestion/gcs): resolve merge conflicts and restructure docs to m…
bryanprosser-acryl 07f1de4
refactor(gcs): update Workload Identity Federation credential handling
bryanprosser-acryl 62e4861
Merge branch 'master' into gcs-wif-support-update
bryanprosser-acryl 1f7a427
feat(gcs): Centralise GCP Workload Identity Federation configuration …
bryanprosser-acryl 0bd874c
Merge branch 'gcs-wif-support-update' of https://github.com/datahub-p…
bryanprosser-acryl df2df56
Merge branch 'master' into gcs-wif-support-update
bryanprosser-acryl aa88811
fix(ingestion/gcs): address PR review findings for WIF implementation
bryanprosser-acryl 3bb4e2b
Merge branch 'gcs-wif-support-update' of https://github.com/datahub-p…
bryanprosser-acryl 60564fe
Merge branch 'master' into gcs-wif-support-update
bryanprosser-acryl 60803c4
Merge branch 'master' into gcs-wif-support-update
bryanprosser-acryl d7e6154
Merge branch 'master' into gcs-wif-support-update
bryanprosser-acryl 0a1e129
Merge branch 'master' into gcs-wif-support-update
bryanprosser-acryl b0c4239
feat(gcp): Add validation for mutual exclusion of WIF configuration o…
bryanprosser-acryl 4a90c0d
Merge branch 'master' into gcs-wif-support-update
bryanprosser-acryl f95c495
Merge branch 'master' into gcs-wif-support-update
bryanprosser-acryl File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
183 changes: 183 additions & 0 deletions
183
metadata-ingestion/src/datahub/ingestion/source/common/gcp_wif_config.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,183 @@ | ||
| import json | ||
| import logging | ||
| from typing import Any, Dict, Optional, Tuple, Union | ||
|
|
||
| from google.auth import load_credentials_from_dict | ||
| from google.auth.credentials import Credentials | ||
| from google.auth.transport.requests import Request | ||
| from pydantic import Field, model_validator | ||
|
|
||
| from datahub.configuration.common import ConfigModel | ||
|
|
||
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| class GCPWIFConfig(ConfigModel): | ||
| """ | ||
| Mixin config for GCP Workload Identity Federation (WIF) authentication. | ||
|
|
||
| Provides three mutually-exclusive ways to supply the WIF JSON configuration. | ||
| Sources that support WIF inherit from this class and call `load_wif_credentials` | ||
| to obtain a `google.auth.credentials.Credentials` object. | ||
|
|
||
| BigQuery, Dataplex, VertexAI, and other GCP sources can adopt this mixin when | ||
| they need WIF support — no changes required to this module. | ||
| """ | ||
|
|
||
| gcp_wif_configuration: Optional[str] = Field( | ||
| default=None, | ||
| description=( | ||
| "Path to the GCP Workload Identity Federation configuration JSON file. " | ||
| "Mutually exclusive with gcp_wif_configuration_json and " | ||
| "gcp_wif_configuration_json_string." | ||
| ), | ||
| ) | ||
|
|
||
| gcp_wif_configuration_json: Optional[Union[str, Dict[str, Any]]] = Field( | ||
| default=None, | ||
| description=( | ||
| "GCP Workload Identity Federation configuration as a JSON string or dict. " | ||
| "Mutually exclusive with gcp_wif_configuration and " | ||
| "gcp_wif_configuration_json_string." | ||
| ), | ||
| ) | ||
|
|
||
| gcp_wif_configuration_json_string: Optional[str] = Field( | ||
| default=None, | ||
| description=( | ||
| "GCP Workload Identity Federation configuration as a JSON string " | ||
| "(contents of the configuration file). Useful for injecting configuration " | ||
| "from secrets managers. Mutually exclusive with gcp_wif_configuration and " | ||
| "gcp_wif_configuration_json." | ||
| ), | ||
| ) | ||
|
|
||
| @model_validator(mode="before") | ||
| @classmethod | ||
| def _validate_wif_json_format(cls, values: Dict[str, Any]) -> Dict[str, Any]: | ||
| """Validate that the JSON-typed WIF options contain valid JSON.""" | ||
| if not isinstance(values, dict): | ||
| return values | ||
|
|
||
| gcp_wif_configuration_json = values.get("gcp_wif_configuration_json") | ||
| gcp_wif_configuration_json_string = values.get( | ||
| "gcp_wif_configuration_json_string" | ||
| ) | ||
|
|
||
| if gcp_wif_configuration_json: | ||
| if isinstance(gcp_wif_configuration_json, str): | ||
| try: | ||
| json.loads(gcp_wif_configuration_json) | ||
| except json.JSONDecodeError as e: | ||
| raise ValueError( | ||
| f"gcp_wif_configuration_json must be valid JSON: {e}" | ||
| ) from e | ||
| elif not isinstance(gcp_wif_configuration_json, dict): | ||
| raise ValueError( | ||
| "gcp_wif_configuration_json must be either a JSON string or a dictionary" | ||
| ) | ||
|
|
||
| if gcp_wif_configuration_json_string: | ||
| try: | ||
| json.loads(gcp_wif_configuration_json_string) | ||
| except json.JSONDecodeError as e: | ||
| raise ValueError( | ||
| f"gcp_wif_configuration_json_string must be valid JSON: {e}" | ||
| ) from e | ||
|
|
||
| return values | ||
|
|
||
| @model_validator(mode="after") | ||
| def _validate_wif_mutual_exclusion(self) -> "GCPWIFConfig": | ||
| """Validate that at most one WIF configuration option is set.""" | ||
| provided = [ | ||
| opt | ||
| for opt in [ | ||
| self.gcp_wif_configuration, | ||
| self.gcp_wif_configuration_json, | ||
| self.gcp_wif_configuration_json_string, | ||
| ] | ||
| if opt is not None | ||
| ] | ||
| if len(provided) > 1: | ||
| raise ValueError( | ||
| "Cannot specify multiple WIF configuration options. Use only one of: " | ||
| "gcp_wif_configuration, gcp_wif_configuration_json, or gcp_wif_configuration_json_string." | ||
| ) | ||
| return self | ||
|
|
||
|
|
||
| def load_wif_credentials( | ||
| wif_config: GCPWIFConfig, | ||
| ) -> Tuple[Credentials, Optional[str]]: | ||
| """ | ||
| Load GCP Workload Identity Federation credentials from a GCPWIFConfig. | ||
|
|
||
| Resolves whichever config option is set to a dict, then calls | ||
| `google.auth.load_credentials_from_dict`. Applies the cloud-platform scope | ||
| (required for service account impersonation via WIF) and attempts an initial | ||
| token refresh to validate the credentials. | ||
|
|
||
| Returns: | ||
| A tuple of (credentials, project_id). project_id may be None if the WIF | ||
| configuration does not specify one. | ||
|
|
||
| Raises: | ||
| ValueError: If no WIF configuration is provided or if credential loading fails. | ||
| """ | ||
| if not any( | ||
| [ | ||
| wif_config.gcp_wif_configuration, | ||
| wif_config.gcp_wif_configuration_json, | ||
| wif_config.gcp_wif_configuration_json_string, | ||
| ] | ||
| ): | ||
| raise ValueError("No valid WIF configuration provided") | ||
|
|
||
| try: | ||
| if wif_config.gcp_wif_configuration: | ||
| with open(wif_config.gcp_wif_configuration) as f: | ||
| wif_config_dict: Dict[str, Any] = json.load(f) | ||
| logger.info( | ||
| "Using Workload Identity Federation configuration from file: %s", | ||
| wif_config.gcp_wif_configuration, | ||
| ) | ||
| elif wif_config.gcp_wif_configuration_json: | ||
| if isinstance(wif_config.gcp_wif_configuration_json, dict): | ||
| wif_config_dict = wif_config.gcp_wif_configuration_json | ||
| else: | ||
| wif_config_dict = json.loads(wif_config.gcp_wif_configuration_json) | ||
| logger.info( | ||
| "Using Workload Identity Federation configuration from JSON content" | ||
| ) | ||
| else: | ||
| wif_config_dict = json.loads(wif_config.gcp_wif_configuration_json_string) # type: ignore[arg-type] | ||
| logger.info( | ||
| "Using Workload Identity Federation configuration from JSON string" | ||
| ) | ||
|
|
||
| credentials, project_id = load_credentials_from_dict(wif_config_dict) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 👍 |
||
| # Impersonation (WIF → SA) requires scopes; otherwise IAM returns 400 "Scope required." | ||
| credentials = credentials.with_scopes( | ||
| ["https://www.googleapis.com/auth/cloud-platform"] | ||
| ) | ||
|
|
||
| # Try to refresh credentials to validate they work. | ||
| # If refresh fails, log a warning but continue — the caller will refresh | ||
| # automatically on the first actual API call. | ||
| try: | ||
| credentials.refresh(Request()) | ||
| logger.debug("Successfully refreshed WIF credentials") | ||
| except Exception as refresh_error: | ||
| logger.warning( | ||
| "Failed to refresh WIF credentials during setup (this may be expected): %s", | ||
| refresh_error, | ||
| ) | ||
|
|
||
| logger.info("Successfully loaded Workload Identity Federation credentials") | ||
| return credentials, project_id | ||
|
|
||
| except Exception as e: | ||
| raise ValueError( | ||
| f"Failed to load Workload Identity Federation credentials: {e}" | ||
| ) from e | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could be moved up as a pydantic validation
you could also add the validation that only one is set