Skip to content

FEAT: Add OIDC debug logging for authentication troubleshooting#4411

Draft
Sagar-6203620715 wants to merge 1 commit intokubernetes-sigs:mainfrom
Sagar-6203620715:feat/oidc-debug-logging-3576
Draft

FEAT: Add OIDC debug logging for authentication troubleshooting#4411
Sagar-6203620715 wants to merge 1 commit intokubernetes-sigs:mainfrom
Sagar-6203620715:feat/oidc-debug-logging-3576

Conversation

@Sagar-6203620715
Copy link
Contributor

Summary

Adds comprehensive debug logging for OIDC authentication flow to troubleshoot auth loops and token refresh failures.

Root Cause Identified

Users experiencing OIDC authentication loops (successful IdP login → immediate re-login on every UI action) had zero diagnostic visibility. Existing logs only showed generic errors like "failed to refresh token" with no context about:

  • Which user is attempting login
  • Token lifecycle state (exchange, verify, refresh)
  • Provider endpoints being contacted
  • Token expiry/validity status

This made troubleshooting OIDC issues across different providers (Azure AD, Keycloak, Authelia, GKE) nearly impossible.

Fix Applied

Adds opt-in debug logging to trace the complete OIDC authentication flow:

Changes

  • Add LevelDebug to logger - New debug level for verbose logging
  • Add --oidc-debug flag - Enable OIDC flow debugging (off by default)
  • Instrument OIDC callback handler - 6 debug checkpoints:
    • Callback received
    • State validated
    • Token exchange successful (with metadata)
    • ID token verified (with subject/issuer/expiry)
    • User claims extracted (email/name)
    • Login completed (redirect info)
  • Instrument token refresh flow - 4 debug checkpoints:
    • Refresh initiated
    • Provider contacted
    • New token obtained
    • Refresh completed/failed
  • Enhance error messages - Add issuer URLs, endpoints, token types to errors
  • Sanitize sensitive data - Log user identity without exposing tokens

Related Issue

Fixes #3576

Testing

Enable Debug Logging

# Via environment variable
export HEADLAMP_CONFIG_OIDC_DEBUG=true
./headlamp-server

# Or via flag
./headlamp-server --oidc-debug

Expected Debug Output

When OIDC login occurs, logs will show:

{"level":"debug","message":"OIDC callback received","endpoint":"/oidc-callback"}
{"level":"debug","message":"OIDC state validated successfully","state":"abc12345..."}
{"level":"debug","message":"Token exchange successful","token_type":"id_token","has_refresh":"true","expires_in":"1h0m0s"}
{"level":"debug","message":"ID Token verified successfully","subject":"user@example.com","issuer":"https://idp.example.com"}
{"level":"debug","message":"User claims extracted successfully","user_email":"user@example.com","user_name":"John Doe"}
{"level":"debug","message":"OIDC login completed, redirecting to UI","cluster":"production"}

Built & Run

make backend
./backend/headlamp-server --oidc-debug

Impact

  • Zero breaking changes - Debug logging off by default
  • No security risks - Tokens never logged, only metadata
  • Solves auth loop debugging - Operators can now trace OIDC flow end-to-end
  • Works across all OIDC providers - Azure AD, Keycloak, Authelia, GKE, etc.

Notes for Reviewers

  • All debug logging is conditional on config.oidcDebug flag
  • Token values are never logged (only metadata like expiry, type, has_refresh)
  • User identity (email/name) only logged in debug mode
  • Feature is backwards compatible - no changes to existing behavior when debug is off

- Add LevelDebug to logger for verbose logging
- Add --oidc-debug flag to enable OIDC flow debugging
- Instrument OIDC callback handler with 6 debug checkpoints
- Instrument token refresh flow with 4 debug checkpoints
- Log user identity (email/name) and token metadata
- Enhance error messages with issuer/endpoint context

Fixes kubernetes-sigs#3576
@k8s-ci-robot k8s-ci-robot added the do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. label Jan 21, 2026
@k8s-ci-robot
Copy link
Contributor

Keywords which can automatically close issues and at(@) or hashtag(#) mentions are not allowed in commit messages.

The list of commits with invalid commit messages:

  • 16ba7b9 feat: Add OIDC debug logging for authentication troubleshooting
Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Sagar-6203620715
Once this PR has been reviewed and has the lgtm label, please assign joaquimrocha for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jan 21, 2026
@illume illume requested a review from Copilot January 21, 2026 12:50
@illume
Copy link
Contributor

illume commented Jan 21, 2026

Thanks for this.

Please check the commit messages match the style wee use? Also please check if it compiles and passes checks locally, see backend part of the contributing docs.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds comprehensive debug logging capabilities for OIDC authentication flows to help troubleshoot authentication loops and token refresh failures. The changes introduce a new LevelDebug log level, add an --oidc-debug command-line flag, and instrument key points in the OIDC callback and token refresh flows with detailed debug logging.

Changes:

  • Added LevelDebug constant and handler to the logger package
  • Added OidcDebug configuration field and --oidc-debug flag
  • Modified RefreshAndCacheNewToken to accept a debug flag and log token refresh operations
  • Added debug logging checkpoints throughout the OIDC callback handler and token refresh flow

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
backend/pkg/logger/logger.go Adds LevelDebug constant and debug case to logging switch
backend/pkg/config/config.go Adds OidcDebug configuration field and --oidc-debug flag
backend/pkg/auth/auth.go Adds debugEnabled parameter to RefreshAndCacheNewToken and logs token refresh initiation
backend/cmd/headlamp.go Adds oidcDebug field and implements debug logging at 10+ checkpoints in OIDC flows

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

idToken, err := oauthConfig.Verifier.Verify(oauthConfig.Ctx, rawUserToken)
if err != nil {
logger.Log(logger.LevelError, nil, err, "failed to verify ID Token")
logger.Log(logger.LevelError, nil, err, "failed to verify ID Token signature")
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Critical syntax error: Line 879 has a complete logger.Log call, but lines 880-881 contain orphaned code that appears to be part of an incomplete map literal and duplicate logger.Log call. This will cause a compilation error. The code should either be:

  • Remove lines 880-881 entirely if the enhanced error logging was meant to replace line 879, OR
  • Fix line 879 to include the map parameter like: logger.Log(logger.LevelError, map[string]string{"issuer": oidcAuthConfig.IdpIssuerURL}, err, "failed to verify ID Token signature")
Suggested change
logger.Log(logger.LevelError, nil, err, "failed to verify ID Token signature")
logger.Log(logger.LevelError, map[string]string{

Copilot uses AI. Check for mistakes.
Comment on lines +249 to 252
func RefreshAndCacheNewToken(ctx context.Context, oidcAuthConfig *kubeconfig.OidcConfig, debugEnabled bool,
cache cache.Cache[interface{}],
tokenType, token, issuerURL string,
) (*oauth2.Token, error) {
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function signature for RefreshAndCacheNewToken has been changed to add a debugEnabled bool parameter, but the test calls in auth_test.go have not been updated. This will cause compilation failures. All test calls to this function need to be updated to include the debugEnabled parameter (likely passing false for tests).

Copilot uses AI. Check for mistakes.
Copy link
Contributor

@illume illume left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See previous comments.

@illume illume marked this pull request as draft January 21, 2026 13:00
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 21, 2026
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 7, 2026
@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Debug mode and verbose logging for SSO OIDC

3 participants