Add GCP OAuth authentication support for GKE clusters#4218
Add GCP OAuth authentication support for GKE clusters#4218itpick wants to merge 6 commits intokubernetes-sigs:mainfrom
Conversation
|
|
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: drduker The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Welcome @drduker! |
32982e2 to
fc40cb0
Compare
There was a problem hiding this comment.
Pull request overview
This PR adds Google Cloud Platform OAuth 2.0 authentication support to Headlamp, providing a modern replacement for the deprecated GKE Identity Service. The implementation enables users to authenticate with their Google Cloud accounts when accessing Headlamp deployed on GKE clusters.
Key Changes:
- Implements RFC 7636-compliant PKCE OAuth 2.0 flow with Google as the identity provider
- Adds automatic GKE cluster detection based on server URL patterns
- Integrates OAuth flow into the authentication chooser UI with conditional rendering
Reviewed changes
Copilot reviewed 14 out of 15 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| frontend/src/lib/k8s/gke.ts | Implements GKE cluster detection and OAuth initiation utilities |
| frontend/src/lib/k8s/gke.test.ts | Comprehensive unit tests for GKE utilities |
| frontend/src/lib/k8s/cluster.ts | Adds optional server property to Cluster interface |
| frontend/src/components/cluster/GCPLoginButton.tsx | React component for Google sign-in button with conditional rendering |
| frontend/src/components/cluster/GCPLoginButton.test.tsx | Component tests for GCPLoginButton |
| frontend/src/components/authchooser/index.tsx | Integrates GCP OAuth option and disables auto-redirect to token page |
| backend/pkg/gcp/auth.go | Core OAuth 2.0 authenticator with PKCE, token refresh, and caching |
| backend/pkg/gcp/auth_test.go | Unit tests for GCP authenticator functions |
| backend/pkg/auth/gcp.go | HTTP handlers for OAuth login, callback, and token refresh flows |
| backend/pkg/config/config.go | Adds GCP OAuth configuration fields with validation |
| backend/cmd/server.go | Populates GCP OAuth configuration from config |
| backend/cmd/headlamp.go | Registers OAuth routes and clears in-cluster auth when GCP OAuth enabled |
| backend/go.mod | Adds cloud.google.com/go/compute/metadata dependency |
| backend/go.sum | Updates dependency checksums including golang.org/x/sys version bump |
| docs/GCP_OAUTH_GKE_SETUP.md | Comprehensive deployment and configuration guide |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
skoeva
left a comment
There was a problem hiding this comment.
Hi, thanks for looking into this.
I see the PR was generated with Claude, please make sure to thoroughly go through the Copilot comments, test your changes, and ensure tests pass before marking this PR ready for review.
|
Hi, thanks for looking into this. I see the PR was generated with Claude, please make sure to thoroughly go through the Copilot comments, test your changes, and ensure tests pass before marking this PR ready for review. Yes, was just trying to get it working, which it is. Not sure how much more time want to put into this yet. |
|
No worries! That's good to know. Feel free to ping if you would like someone else to take this over |
Ok, no worries. Thanks for sharing anyway. Let’s leave this open for a while longer. If someone else wants to pick this up they can :) Otherwise we can close this and it’s archived for anyone searching who may be interested. |
fc40cb0 to
9f382d6
Compare
|
Just signed the CLA thing. |
799c0b4 to
df28d25
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 28 out of 31 changed files in this pull request and generated 3 comments.
Files not reviewed (1)
- frontend/package-lock.json: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if cluster == "" { | ||
| http.Error(w, "cluster parameter required", http.StatusBadRequest) | ||
| return | ||
| } |
There was a problem hiding this comment.
The cluster parameter should be validated against validClusterNamePattern before storing it in a cookie. Currently, validation only happens in the callback handler, which means an attacker could inject malicious cluster names that get stored in cookies during the login flow. Add validation after line 50:
if !validClusterNamePattern.MatchString(cluster) {
http.Error(w, "invalid cluster name format", http.StatusBadRequest)
return
}| } | |
| } | |
| if !validClusterNamePattern.MatchString(cluster) { | |
| http.Error(w, "invalid cluster name format", http.StatusBadRequest) | |
| return | |
| } |
There was a problem hiding this comment.
Can you please check this one?
| // HandleGCPAuthLogin initiates the GCP OAuth login flow for GKE clusters. | ||
| func HandleGCPAuthLogin(gcpAuth *gcp.GCPAuthenticator, baseURL string) http.HandlerFunc { | ||
| return func(w http.ResponseWriter, r *http.Request) { | ||
| cluster := r.URL.Query().Get("cluster") | ||
| if cluster == "" { | ||
| http.Error(w, "cluster parameter required", http.StatusBadRequest) | ||
| return | ||
| } | ||
|
|
||
| // Generate state token for CSRF protection | ||
| state, err := gcp.GenerateRandomState() | ||
| if err != nil { | ||
| logger.Log(logger.LevelError, nil, err, "failed to generate state") | ||
| http.Error(w, "failed to generate state", http.StatusInternalServerError) | ||
|
|
||
| return | ||
| } | ||
|
|
||
| // Generate PKCE code verifier and challenge for enhanced security | ||
| codeVerifier, err := gcp.GenerateCodeVerifier() | ||
| if err != nil { | ||
| logger.Log(logger.LevelError, nil, err, "failed to generate code verifier") | ||
| http.Error(w, "failed to generate code verifier", http.StatusInternalServerError) | ||
|
|
||
| return | ||
| } | ||
|
|
||
| codeChallenge := gcp.GenerateCodeChallenge(codeVerifier) | ||
|
|
||
| secure := IsSecureContext(r) | ||
|
|
||
| // Store state, cluster, and PKCE verifier in cookies for validation in callback | ||
| setOAuthCookie(w, gcpOAuthStateCookie, state, secure) | ||
| setOAuthCookie(w, gcpOAuthClusterCookie, cluster, secure) | ||
| setOAuthCookie(w, gcpOAuthVerifierCookie, codeVerifier, secure) | ||
|
|
||
| // Redirect to Google OAuth | ||
| authURL := gcpAuth.GetAuthCodeURL(state, codeChallenge) | ||
|
|
||
| logger.Log(logger.LevelInfo, map[string]string{ | ||
| "cluster": cluster, | ||
| }, nil, "initiating GCP OAuth flow") | ||
|
|
||
| http.Redirect(w, r, authURL, http.StatusFound) | ||
| } | ||
| } | ||
|
|
||
| // gcpCallbackData holds validated data from the OAuth callback. | ||
| type gcpCallbackData struct { | ||
| cluster string | ||
| codeVerifier string | ||
| code string | ||
| } | ||
|
|
||
| // validateGCPCallback validates the OAuth callback request and returns extracted data. | ||
| func validateGCPCallback(r *http.Request) (*gcpCallbackData, error) { | ||
| // Validate state token (CSRF protection) | ||
| stateCookie, err := r.Cookie(gcpOAuthStateCookie) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("state cookie not found: %w", err) | ||
| } | ||
|
|
||
| stateParam := r.URL.Query().Get("state") | ||
| if stateCookie.Value != stateParam { | ||
| return nil, fmt.Errorf("state mismatch: cookie=%s, param=%s", stateCookie.Value, stateParam) | ||
| } | ||
|
|
||
| // Get cluster from cookie | ||
| clusterCookie, err := r.Cookie(gcpOAuthClusterCookie) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("cluster cookie not found: %w", err) | ||
| } | ||
|
|
||
| cluster := clusterCookie.Value | ||
| if !validClusterNamePattern.MatchString(cluster) { | ||
| return nil, fmt.Errorf("invalid cluster name format: %s", cluster) | ||
| } | ||
|
|
||
| // Check for OAuth errors | ||
| if errParam := r.URL.Query().Get("error"); errParam != "" { | ||
| errDesc := r.URL.Query().Get("error_description") | ||
| return nil, fmt.Errorf("OAuth error: %s - %s", errParam, errDesc) | ||
| } | ||
|
|
||
| code := r.URL.Query().Get("code") | ||
| if code == "" { | ||
| return nil, fmt.Errorf("no code in request") | ||
| } | ||
|
|
||
| // Get PKCE code verifier (optional) | ||
| codeVerifier := "" | ||
| if verifierCookie, err := r.Cookie(gcpOAuthVerifierCookie); err == nil { | ||
| codeVerifier = verifierCookie.Value | ||
| } | ||
|
|
||
| return &gcpCallbackData{ | ||
| cluster: cluster, | ||
| codeVerifier: codeVerifier, | ||
| code: code, | ||
| }, nil | ||
| } | ||
|
|
||
| // HandleGCPAuthCallback handles the OAuth callback from Google. | ||
| func HandleGCPAuthCallback(gcpAuth *gcp.GCPAuthenticator, baseURL string) http.HandlerFunc { | ||
| return func(w http.ResponseWriter, r *http.Request) { | ||
| ctx := r.Context() | ||
|
|
||
| data, err := validateGCPCallback(r) | ||
| if err != nil { | ||
| logger.Log(logger.LevelError, nil, err, "OAuth callback validation failed") | ||
| http.Error(w, err.Error(), http.StatusBadRequest) | ||
|
|
||
| return | ||
| } | ||
|
|
||
| token, err := gcpAuth.Exchange(ctx, data.code, data.codeVerifier) | ||
| if err != nil { | ||
| logger.Log(logger.LevelError, map[string]string{"cluster": data.cluster}, err, "failed to exchange code") | ||
| http.Error(w, "failed to exchange token", http.StatusInternalServerError) | ||
|
|
||
| return | ||
| } | ||
|
|
||
| gkeToken, err := gcpAuth.GetGKEAccessToken(ctx, token) | ||
| if err != nil { | ||
| logger.Log(logger.LevelError, map[string]string{"cluster": data.cluster}, err, "failed to get GKE token") | ||
| http.Error(w, "failed to get GKE token", http.StatusInternalServerError) | ||
|
|
||
| return | ||
| } | ||
|
|
||
| // Cache the refresh token (non-fatal if it fails) | ||
| if token.RefreshToken != "" { | ||
| if cacheErr := gcpAuth.CacheRefreshToken(ctx, data.cluster, gkeToken, token.RefreshToken); cacheErr != nil { | ||
| logger.Log(logger.LevelError, map[string]string{"cluster": data.cluster}, cacheErr, "failed to cache refresh token") | ||
| } | ||
| } | ||
|
|
||
| SetTokenCookie(w, r, data.cluster, gkeToken, baseURL) | ||
|
|
||
| secure := IsSecureContext(r) | ||
| clearOAuthCookie(w, gcpOAuthStateCookie, secure) | ||
| clearOAuthCookie(w, gcpOAuthClusterCookie, secure) | ||
| clearOAuthCookie(w, gcpOAuthVerifierCookie, secure) | ||
|
|
||
| logger.Log(logger.LevelInfo, map[string]string{"cluster": data.cluster}, nil, "GCP OAuth flow completed") | ||
|
|
||
| redirectURL := fmt.Sprintf("/#/c/%s", data.cluster) | ||
| if baseURL != "" { | ||
| redirectURL = "/" + baseURL + redirectURL | ||
| } | ||
|
|
||
| http.Redirect(w, r, redirectURL, http.StatusFound) | ||
| } | ||
| } | ||
|
|
||
| // HandleGCPTokenRefresh handles token refresh requests for GKE clusters. | ||
| func HandleGCPTokenRefresh(gcpAuth *gcp.GCPAuthenticator, baseURL string) http.HandlerFunc { | ||
| return func(w http.ResponseWriter, r *http.Request) { | ||
| ctx := r.Context() | ||
|
|
||
| cluster, token := ParseClusterAndToken(r) | ||
| if cluster == "" || token == "" { | ||
| http.Error(w, "cluster and token required", http.StatusBadRequest) | ||
| return | ||
| } | ||
|
|
||
| // Get cached refresh token | ||
| refreshToken, err := gcpAuth.GetCachedRefreshToken(ctx, cluster, token) | ||
| if err != nil { | ||
| logger.Log(logger.LevelError, map[string]string{ | ||
| "cluster": cluster, | ||
| }, err, "failed to get cached refresh token") | ||
| http.Error(w, "no refresh token available", http.StatusUnauthorized) | ||
|
|
||
| return | ||
| } | ||
|
|
||
| // Refresh the token | ||
| newToken, err := gcpAuth.RefreshToken(ctx, refreshToken) | ||
| if err != nil { | ||
| logger.Log(logger.LevelError, map[string]string{ | ||
| "cluster": cluster, | ||
| }, err, "failed to refresh token") | ||
| http.Error(w, "failed to refresh token", http.StatusInternalServerError) | ||
|
|
||
| return | ||
| } | ||
|
|
||
| // Get new GKE access token | ||
| newGKEToken, err := gcpAuth.GetGKEAccessToken(ctx, newToken) | ||
| if err != nil { | ||
| logger.Log(logger.LevelError, map[string]string{ | ||
| "cluster": cluster, | ||
| }, err, "failed to get new GKE access token") | ||
| http.Error(w, "failed to get new GKE token", http.StatusInternalServerError) | ||
|
|
||
| return | ||
| } | ||
|
|
||
| // Cache the new refresh token if we got one (non-fatal if it fails) | ||
| if newToken.RefreshToken != "" { | ||
| if cacheErr := gcpAuth.CacheRefreshToken(ctx, cluster, newGKEToken, newToken.RefreshToken); cacheErr != nil { | ||
| logger.Log(logger.LevelError, map[string]string{"cluster": cluster}, cacheErr, "failed to cache new refresh token") | ||
| } | ||
| } | ||
|
|
||
| // Set new token in cookie | ||
| SetTokenCookie(w, r, cluster, newGKEToken, baseURL) | ||
|
|
||
| logger.Log(logger.LevelInfo, map[string]string{ | ||
| "cluster": cluster, | ||
| }, nil, "token refreshed successfully") | ||
|
|
||
| w.WriteHeader(http.StatusOK) | ||
| _, _ = w.Write([]byte("token refreshed")) | ||
| } | ||
| } | ||
|
|
||
| // setOAuthCookie sets a temporary cookie for OAuth flow state. | ||
| func setOAuthCookie(w http.ResponseWriter, name, value string, secure bool) { | ||
| http.SetCookie(w, &http.Cookie{ | ||
| Name: name, | ||
| Value: value, | ||
| Path: "/", | ||
| MaxAge: int(oauthFlowTimeout.Seconds()), | ||
| HttpOnly: true, | ||
| Secure: secure, | ||
| SameSite: http.SameSiteLaxMode, | ||
| }) | ||
| } | ||
|
|
||
| // clearOAuthCookie clears a cookie by setting it to expire immediately. | ||
| func clearOAuthCookie(w http.ResponseWriter, name string, secure bool) { | ||
| http.SetCookie(w, &http.Cookie{ | ||
| Name: name, | ||
| Value: "", | ||
| Path: "/", | ||
| MaxAge: -1, | ||
| HttpOnly: true, | ||
| Secure: secure, | ||
| SameSite: http.SameSiteLaxMode, | ||
| }) | ||
| } |
There was a problem hiding this comment.
The GCP OAuth HTTP handlers (HandleGCPAuthLogin, HandleGCPAuthCallback, HandleGCPTokenRefresh) lack unit test coverage. While the underlying GCPAuthenticator has tests in backend/pkg/gcp/auth_test.go, the HTTP handler functions should also have tests to verify request validation, error handling, cookie management, and redirect logic. Consider adding tests in a new file backend/pkg/auth/gcp_test.go.
There was a problem hiding this comment.
Can you please consider this one?
df28d25 to
72ef722
Compare
This implementation adds GCP OAuth 2.0 authentication to Headlamp, replacing the deprecated Identity Service for GKE. Users can authenticate with their Google Cloud account, and the authentication tokens are used to access Kubernetes resources with proper RBAC. Backend changes: - New GCP authenticator package with RFC 7636-compliant PKCE support - OAuth HTTP handlers for login, callback, and token refresh - Configuration via environment variables - Token caching and automatic refresh mechanisms - Input validation to prevent injection attacks Frontend changes: - GCPLoginButton component for Google sign-in - GKE cluster detection based on server URL patterns - Integration into existing authentication chooser UI - Comprehensive test coverage Documentation: - Complete setup guide for GKE deployments - RBAC configuration examples - Troubleshooting guide 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
72ef722 to
83732cd
Compare
- Fix golangci-lint wsl errors in gcp_test.go by adding blank lines before assignments that were cuddled with non-assignments - Restore accidentally removed redirect logic in AuthChooser that automatically redirects to the token page when a cluster requires token authentication without OIDC configured 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When GCP OAuth is configured, show the auth chooser dialog instead of automatically redirecting to the token page. This allows users to choose between Google Sign In and token authentication. The redirect to token page is now conditional on GCP OAuth being disabled, which preserves backward compatibility with e2e tests and non-GCP deployments. Also updated GCPLoginButton to only show when GCP OAuth is explicitly enabled via environment variable, not based on cluster type detection.
e6be69d to
7fd4a1b
Compare
Tests now mock isGCPOAuthEnabled to return true by default, and use waitFor for async state updates since the component checks OAuth status on mount.
illume
left a comment
There was a problem hiding this comment.
I did a very quick review and ran the ci again. Can you please check these open copilot review notes?
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 29 out of 32 changed files in this pull request and generated 7 comments.
Files not reviewed (1)
- frontend/package-lock.json: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // GCP OAuth configs for GKE | ||
| GCPOAuthEnabled bool `koanf:"gcp-oauth-enabled"` | ||
| GCPClientID string `koanf:"gcp-client-id"` | ||
| GCPClientSecret string `koanf:"gcp-client-secret"` | ||
| GCPRedirectURL string `koanf:"gcp-redirect-url"` |
There was a problem hiding this comment.
Consider adding comments for each GCP OAuth configuration field to explain their purpose, similar to the documentation style used for other config fields in this file (e.g., OIDC fields have detailed comments explaining when they're used).
| // When GCP OAuth is enabled, clear the auth info so users must authenticate via GCP OAuth | ||
| if config.gcpOAuthEnabled { | ||
| context.AuthInfo = &api.AuthInfo{} | ||
|
|
||
| logger.Log(logger.LevelInfo, nil, nil, "Added in-cluster context without service account auth (GCP OAuth enabled)") | ||
| } |
There was a problem hiding this comment.
The comment and log message could be clearer about the security implications. Consider explaining that this prevents using the service account token for authentication, ensuring all requests go through GCP OAuth for proper user attribution and RBAC.
| // Check if GCP OAuth is enabled before auto-redirecting to token page. | ||
| // If GCP OAuth is enabled, we want to show the auth chooser so users can | ||
| // choose between Google Sign In and token authentication. | ||
| isGCPOAuthEnabled().then(gcpEnabled => { | ||
| if (!gcpEnabled && !cancelledRef.current) { | ||
| // GCP OAuth not enabled, so redirect to token page | ||
| history.replace({ | ||
| pathname: generatePath(getClusterPrefixedPath('token'), { | ||
| cluster: clusterName as string, | ||
| }), | ||
| }); | ||
| } | ||
| // If GCP OAuth is enabled, stay on auth chooser to show both options | ||
| }); |
There was a problem hiding this comment.
The promise returned by isGCPOAuthEnabled() is not being caught. Consider adding a .catch() handler to handle potential network errors gracefully, which could prevent the auth chooser from displaying properly in case of fetch failures.
| // Check if GCP OAuth is enabled before auto-redirecting | ||
| isGCPOAuthEnabled().then(gcpEnabled => { | ||
| if (!gcpEnabled && !cancelledRef.current) { | ||
| history.replace({ | ||
| pathname: generatePath(getClusterPrefixedPath('token'), { | ||
| cluster: clusterName as string, | ||
| }), | ||
| }); | ||
| } | ||
| }); |
There was a problem hiding this comment.
Similar to the previous instance, this promise lacks error handling. Add a .catch() to handle potential errors from isGCPOAuthEnabled().
| React.useEffect(() => { | ||
| isGCPOAuthEnabled() | ||
| .then(enabled => { | ||
| setGcpOAuthEnabled(enabled); | ||
| }) | ||
| .catch(error => { | ||
| console.warn('Failed to check GCP OAuth status:', error); | ||
| setGcpOAuthEnabled(false); | ||
| }); | ||
| }, []); |
There was a problem hiding this comment.
The useEffect hook should return a cleanup function to handle component unmounting during the async operation. Consider using an AbortController or a mounted flag to prevent state updates after unmount.
| // Cache key prefix for refresh tokens. | ||
| refreshTokenCachePrefix = "gcp-refresh-" //nolint:gosec // G101: This is a cache key prefix, not a credential |
There was a problem hiding this comment.
While the nolint comment is appropriate, consider adding a brief explanation in the comment itself that this is only a key prefix used for cache lookups, not the actual token value being stored, to make the security rationale clearer for future reviewers.
| "resolved": "https://registry.npmjs.org/@babel/core/-/core-7.28.3.tgz", | ||
| "integrity": "sha512-yDBHV9kQNcr2/sUr9jghVyz9C3Y5G2zUM2H2lo+9mKv4sFgbA8s8Z9t8D1jiTkGoO/NoIfKMyKWr4s6CN23ZwQ==", | ||
| "license": "MIT", | ||
| "peer": true, |
There was a problem hiding this comment.
The package-lock.json file has many peer: true additions throughout. While this appears to be an npm/package manager behavior change, ensure that all these peer dependencies are intentionally marked and that the package.json correctly specifies peer dependency relationships if needed.
|
Hello, We were expecting this PR to be included in the 0.40.0 release and noticed that it was later removed from the milestone. We have tested the change in our local environment and it works as expected. Could you please share the reason why the PR was not accepted and removed from the release? Also, should we expect an alternative or improved implementation in a future release? |
This implementation adds Google Cloud Platform OAuth 2.0 authentication to Headlamp, providing a replacement for the deprecated Identity Service for GKE. Users can now authenticate with their Google Cloud accounts when accessing Headlamp deployed on GKE clusters.
Backend Changes
Frontend Changes
Key Features
Documentation
Testing
This implementation has been tested and verified to work with GKE clusters, including successful OAuth flow initiation and PKCE code challenge generation.
Summary
This PR adds/fixes [feature/bug] by [brief description of what the change does].
Related Issue
Fixes #ISSUE_NUMBER
Changes
Steps to Test
Screenshots (if applicable)
Notes for the Reviewer