Skip to content

feat(bff): introduce support of multiple auth methods (internal, user_token) #918

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Apr 17, 2025

Conversation

ederign
Copy link
Member

@ederign ederign commented Mar 29, 2025

Description

This PR introduces a pluggable kubernetes client authentication mechanism for the Model Registry UI BFF, enabling support for two modes:

  • internal (default and current one) — uses the credentials of the backend service:

    • In-cluster: pod service account
    • Local dev: current kubeconfig user context
    • Identity is inferred from kubeflow-userid and kubeflow-groups headers
  • user_token — uses a user-provided token from the Authorization: Bearer <your-token> header

    • Used to create a SelfSubjectAccessReview
    • Identity is the token itself

To support this, I introduced a KubernetesClientFactory abstraction that cleanly encapsulates all logic related to Kubernetes client creation. This avoids leaking the client instance throughout the codebase, enables per-request instantiation for token-based clients, and simplifies mocking in tests. The result is a more modular, secure, and testable architecture.

File-by-file breakdown of significant changes

Makefile

  • Added AUTH_METHOD ?= internal default var
  • Passed --auth-method flag to go run ./cmd in run target

README.md

  • Clarified authentication methods and updated cURL examples to reflect both kubeflow-userid and Authorization: Bearer headers
  • Explained the meaning of each mode to the developer

cmd/main.go

  • Added CLI flag --auth-method with validation (internal or user_token)
  • Hooked it into config.EnvConfig
  • Updated shutdown path to simplify logic

internal/api/app.go

  • Replaced single client instance with a KubernetesClientFactory
  • Factory is instantiated depending on:
    • Whether we are mocking K8s (envtest)
    • Chosen auth-method
  • Adjusted App struct to store the factory and optionally the envtest for cleanup in mock mode
  • All route handlers now use dynamically retrieved clients
  • Replaced PerformSARon... middleware with generalized RequireAccessToService and RequireListServiceAccessInNamespace

internal/api/middleware.go

  • Introduced InjectRequestIdentity middleware to create RequestIdentity from headers based on auth-method
  • Refactored all SAR authorization logic to work off a KubernetesClientFactory + per-request identity
  • Clean separation of concerns for:
    • AttachRESTClient — resolves and injects REST client
    • RequireAccessToService — enforces permission on a named K8s Service
    • RequireListServiceAccessInNamespace — enforces permission to list services in a namespace

internal/api/errors.go

  • Updated error type references to use mrserver.HTTPError (instead of integrations)
  • Centralized error handling and serialization

internal/api/*.go (handler files)

  • Updated handlers to use new interfaces: mrserver.HTTPClientInterface
  • Replaced old identity extraction logic with context-based RequestIdentity
  • Cleaned up duplicate logic for user/group extraction

internal/api/*_test.go

  • Updated all tests to:
    • Use kubernetesMockedStaticClientFactory instead of raw client
    • Provide a RequestIdentity struct directly
    • Maintain correctness for both valid and forbidden paths

internal/integrations/kubernetes

  • Introduced KubernetesClientFactory interface with two implementations:
    • StaticClientFactory (our old client) — for internal auth (shared client, impersonation support)
    • TokenClientFactory — creates new clients per token
  • Separated InternalKubernetesClient (our old client) and TokenKubernetesClient logic for SAR vs. SelfSAR

internal/integrations/kubernetes/k8mocks

  • SetupEnvTest() now supports both client modes (internal & token)
  • Adds ability to simulate SSAR and SAR scenarios for testing

How Has This Been Tested?

This is how I tested and my suggestion for anyone reviewing this is to test it carefully, has this touches in a crucial part of our bff.

AuthMethodInternal = "internal" (default)

First, to make sure that I didn't break anything on the AuthMethodInternal = "internal" (default)

  1. Local mocked development
make run MOCK_K8S_CLIENT=true MOCK_MR_CLIENT=true
✅ curl -i -H "kubeflow-userid: user@example.com" "localhost:4000/api/v1/model_registry?namespace=kubeflow"
❌ 403: curl -i -H "kubeflow-userid: doraNonAdmin@example.com" "localhost:4000/api/v1/model_registry?namespace=kubeflow"
✅ curl -i -H "kubeflow-userid: doraNonAdmin@example.com" "localhost:4000/api/v1/model_registry?namespace=dora-namespace"
  1. Run the front-end and do a quick sanity check
    i.e. user@example.com should be able to see all namespaces (cluster admin on env test)

  2. On a kubeflow installation, change the deployment to use image: quay.io/ederignatowicz/model-registry-ui-auth:latest . I've build this image to test it... you will see on the logs a "Starting Model Registry"

Screenshot 2025-03-29 at 1 36 34 PM

A word of warning. If you have an old installation, make sure to update the healthcheck path for api/healthcheck instead of api/v1/healthcheck

Do a quick sanity check there just to make sure I didn't break anything! :)

AuthMethodInternal = "user_token"

Let's try the new auth mode.

  1. Local mocked development
make run MOCK_K8S_CLIENT=true MOCK_MR_CLIENT=true AUTH_METHOD=user_token
❌ curl -i -H "kubeflow-userid: user@example.com" "localhost:4000/api/v1/model_registry?namespace=kubeflow"
❌ curl -i -H "Authorization: Bearer $TOKEN" "localhost:4000/api/v1/model_registry?namespace=kubeflow"
user@example.com token is FAKE_CLUSTER_ADMIN_TOKEN
✅curl -i -H "Authorization: Bearer FAKE_CLUSTER_ADMIN_TOKEN" "localhost:4000/api/v1/model_registry?namespace=kubeflow"
FAKE_DORA_TOKEN is doraNonAdmin@example.com no access for kubeflow namespace
❌ curl -i -H "Authorization: Bearer FAKE_DORA_TOKEN" "localhost:4000/api/v1/model_registry?namespace=kubeflow"
✅ curl -i -H "Authorization: Bearer FAKE_DORA_TOKEN" "localhost:4000/api/v1/model_registry?namespace=dora-namespace"
  1. BFF Connected to Kubeflow Cluster
    This will work just on BFF because our front end doesn't support tokens yet. Standalone=true enable namespaces endpoint

make run MOCK_K8S_CLIENT=false MOCK_MR_CLIENT=true AUTH_METHOD=user_token STANDALONE_MODE=true

1-) Create SAs

kubectl create sa admin-dora -n ns-dora
kubectl create sa limited-bella -n ns-bella

2-) Create namespaces

kubectl create ns ns-dora
kubectl create ns ns-bella

3-) Create roles

# rbac/ns-access.yaml
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: ns-reader
  namespace: ns-dora
rules:
  - apiGroups: [""]
    resources: ["namespaces"]
    verbs: ["get", "list"]

# rbac/service-access.yaml
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: service-reader
  namespace: ns-dora
rules:
  - apiGroups: [""]
    resources: ["services"]
    verbs: ["get", "list"]

do the same for namespace ns-bella

4-) Apply them to the right namespaces:

kubectl apply  -f ns-access.yaml  -n ns-dora
kubectl apply  -f ns-access.yaml  -n ns-bella
kubectl apply -f service-access.yaml -n ns-bella 
kubectl apply -f service-access.yaml -n ns-dora 

5-) Create RoleBindings
Bind admin-dora to both namespaces

kubectl create rolebinding admin-ns-access1 \
  --role=ns-reader \
  --serviceaccount=ns-dora:admin-dora \
  -n ns-dora

kubectl create rolebinding admin-svc-access1 \
  --role=service-reader \
  --serviceaccount=ns-dora:admin-dora \
  -n ns-dora 

kubectl create rolebinding admin-ns-access-user1 \
  --role=ns-reader \
  --serviceaccount=ns-dora:admin-dora \
  -n ns-bella 
kubectl create rolebinding admin-svc-access-user1 \
  --role=service-reader \
  --serviceaccount=ns-dora:admin-dora \
  -n ns-bella 

Bind limited-bella to only ns-bella

kubectl create rolebinding limited-ns-access-user2 \
  --role=ns-reader \
  --serviceaccount=ns-bella:limited-bella \
  -n ns-bella 
kubectl create rolebinding limited-svc-access-user2 \
  --role=service-reader \
  --serviceaccount=ns-bella:limited-bella \
  -n ns-bella 

5-) Create Model Registry Services

kind: Service
apiVersion: v1
metadata:
  labels:
    app: model-registry-service
    app.kubernetes.io/component: model-registry
    app.kubernetes.io/instance: model-registry-service
    app.kubernetes.io/name: model-registry-service
    app.kubernetes.io/part-of: model-registry
    component: model-registry
  annotations:
    displayName: Kubeflow Model Registry
    description: An example model registry
  name: bella-user-registry
spec:
  selector:
    component: model-registry-server
  type: ClusterIP
  ports:
  - port: 8080
    protocol: TCP
    appProtocol: http
    name: http-api
  - port: 9090
    protocol: TCP
    appProtocol: grpc
    name: grpc-api
kubectl apply -f mr-service.yaml -n ns-dora

change ns name and apply to ns-bella

6-) Get Tokens

kubectl create token admin-dora -n ns-dora --duration=24h > /tmp/admin-dora.token
ADMIN_DORA=$(cat /tmp/admin-dora.token)
echo $ADMIN_DORA

✅ curl -i -H "Authorization: Bearer $ADMIN_DORA" "localhost:4000/api/v1/model_registry?namespace=ns-dora"
✅ curl -i -H "Authorization: Bearer: $ADMIN_DORA" "localhost:4000/api/v1/model_registry?namespace=ns-bella"
❌curl -i -H "Authorization: Bearer: $ADMIN_DORA" "localhost:4000/api/v1/model_registry?namespace=default"

kubectl create token limited-bella -n ns-bella --duration=24h > /tmp/limited-bella.token
BELLA=$(cat /tmp/limited-bella.token)
echo $BELLA

❌ curl -i -H "Authorization: Bearer $BELLA" "localhost:4000/api/v1/model_registry?namespace=default"
❌ curl -i -H "Authorization: Bearer $BELLA" "localhost:4000/api/v1/model_registry?namespace=ns-dora"
✅ curl -i -H "Authorization: Bearer $BELLA" "localhost:4000/api/v1/model_registry?namespace=ns-bella"

7-) Custom Headers

make run MOCK_K8S_CLIENT=false MOCK_MR_CLIENT=true AUTH_METHOD=user_token STANDALONE_MODE=true  AUTH_TOKEN_HEADER=X-Forwarded-Access-Token AUTH_TOKEN_PREFIX=""
❌ curl -i -H "Authorization: Bearer $BELLA" "localhost:4000/api/v1/model_registry?namespace=ns-bella"
✅ curl -i -H "X-Forwarded-Access-Token: $BELLA" "localhost:4000/api/v1/model_registry?namespace=ns-bella"

Merge criteria:

  • All the commits have been signed-off (To pass the DCO check)
  • The commits have meaningful messages; the author will squash them after approval or in case of manual merges will ask to merge with squash.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work.
  • Code changes follow the kubeflow contribution guidelines.
  • For first time contributors: Please reach out to the Reviewers to ensure all tests are being run, ensuring the label ok-to-test has been added to the PR.

If you have UI changes

  • The developer has added tests or explained why testing cannot be added.
  • Included any necessary screenshots or gifs if it was a UI change.
  • Verify that UI/UX changes conform the UX guidelines for Kubeflow.

Sorry, something went wrong.

@ederign
Copy link
Member Author

ederign commented Mar 29, 2025

/assign @alexcreasy

@rareddy
Copy link
Contributor

rareddy commented Mar 29, 2025

@dhirajsb pls review

ctx, cancel := context.WithTimeout(ctx, 30*time.Second)
defer cancel()

for _, verb := range []string{"get", "list"} {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is copied from the old implementation, but I don't think we need both get and list when no name attribute is supplied to a SSAR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Christian, I've added to my list to double check with MR team, but I'm almost sure that they explicitly asked for this.

Comment on lines 126 to 129
if err != nil {
kc.Logger.Warn("user is not allowed to list namespaces or failed to list namespaces")
return []corev1.Namespace{}, nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log error instead? Also, do include the error in the log to help with debugging.

Why does this function suppress the error instead of sending it back to the caller?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

}
var identity *kubernetes.RequestIdentity

switch app.config.AuthMethod {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't particularly like seeing a switch on switch app.config.AuthMethod { (here or elsewhere) aside from creating a single client. It defeats the purpose of abstracting the implementation details of the individual clients to have different behaviors.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

return services, nil
}

func (kc *SharedClientLogic) GetServiceDetailsByName(sessionCtx context.Context, namespace string, serviceName string) (ServiceDetails, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is inefficient to fetch all services only to find the one by name instead of fetching the one by name in the first place.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Comment on lines +11 to +15
type RequestIdentity struct {
UserID string
Groups []string
Token string
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally each auth method should have their own type to avoid having to mix every property together. But I can accept that all these properties have a relation to identity to stay together as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this in a call and got an agreement that this type will be reused on MR API calls.

@ederign
Copy link
Member Author

ederign commented Mar 31, 2025

@christianvogt thank you so much for the review, I believe I've addressed all our points.

@ederign ederign force-pushed the auth-token branch 2 times, most recently from 80e4980 to 0e6efb1 Compare March 31, 2025 20:25
@christianvogt
Copy link
Contributor

Tested the PR as per the curl commands in the description.
Also successfully ran the UI & BFF locally in conjunction with the central dashboard.

/lgtm

Copy link

@christianvogt: changing LGTM is restricted to collaborators

In response to this:

Tested the PR as per the curl commands in the description.
Also successfully ran the UI & BFF locally in conjunction with the central dashboard.

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

ederign added 4 commits April 11, 2025 07:19
…_token)

- Introduce auth-method flag (default: internal) to select auth strategy
- internal: uses pod service account (in-cluster) or current kubeconfig (local)
- user_token: uses bearer token from X-Forwarded-Access-Token header
- Implement separate SAR (internal) and SSAR (user_token) logic
- Clarify behavior and usage in README

Signed-off-by: Eder Ignatowicz <ignatowicz@gmail.com>
Signed-off-by: Eder Ignatowicz <ignatowicz@gmail.com>
Signed-off-by: Eder Ignatowicz <ignatowicz@gmail.com>
… reuse methods)

Signed-off-by: Eder Ignatowicz <ignatowicz@gmail.com>
Copy link
Contributor

@lucferbux lucferbux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a couple of things I wanna discuss first I don't fully understand the PR, moving the conversation to other channels.

@@ -1,6 +1,5 @@
package models

type User struct {
UserID string `json:"userId"`
ClusterAdmin bool `json:"clusterAdmin"`
UserID string `json:"userId"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
UserID string `json:"userId"`
UserID string `json:"userId"`
ClusterAdmin bool `json:"clusterAdmin"`

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lucferbux fixed on 1a27427


if formattedUser == "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should still get both the ClusterAdmin check and the username

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed on 1a27427

if formattedUser == "" {
//if we are using token based auth, we still need to implement how to
//safely get the user from the token
formattedUser = "unknown"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can get the user from a token with kubectl auth whoami, this calls /apis/authentication.k8s.io/v1/selfsubjectreviews, which returns values like:

ATTRIBUTE                                           VALUE
Username                                            kubernetes-admin
Groups                                              [kubeadm:cluster-admins system:authenticated]
Extra: authentication.kubernetes.io/credential-id   [X509SHA256=230423670e4531f1d6b8b5a8a9680954b3bb95b35353019a1944b29d5ad03148]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the pointers Lucas! I fixed it 1a27427

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I can confirm is working now:

Screenshot 2025-04-16 at 11 51 20

…ract user name in token case

Signed-off-by: Eder Ignatowicz <ignatowicz@gmail.com>
@lucferbux
Copy link
Contributor

Just commenting here, @alexcreasy @christianvogt if @ederign changes the token to Bearer ... it's ok for me, I'll be out the rest of the week, main issue is already fixed, so if you guys can take a look I'm up to approve it.

- Introduced `AuthTokenHeader` and `AuthTokenPrefix` fields to EnvConfig
- Default token extraction uses `Authorization` header with `Bearer ` prefix
- Updated TokenClientFactory to dynamically parse token using configured header and prefix
- Added new CLI flags: `--auth-token-header` and `--auth-token-prefix`
- Updated Makefile to support overriding header and prefix via `AUTH_TOKEN_HEADER` and `AUTH_TOKEN_PREFIX`
- Improved error messages and testability of token parsing logic
- Added Ginkgo unit tests for TokenClientFactory.ExtractRequestIdentity with and without prefixes
- Cleaned up README:
  - Replaced all `X-Forwarded-Access-Token` references with `Authorization: Bearer`
  - Documented how to override token header and prefix via CLI, env, or Makefile

Signed-off-by: Eder Ignatowicz <ignatowicz@gmail.com>
@christianvogt
Copy link
Contributor

Re-tested following the curl commands successfully.
/lgtm

Copy link

@christianvogt: changing LGTM is restricted to collaborators

In response to this:

Re-tested following the curl commands successfully.
/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Contributor

@alexcreasy alexcreasy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just identified one small change, otherwise this looks all good!

}

allowed, err := app.kubernetesClient.PerformSARonSpecificService(user, userGroups, namespace, modelRegistryID)
allowed, err := client.CanAccessServiceInNamespace(r.Context(), identity, namespace, serviceName)

if err != nil {
app.forbiddenResponse(w, r, "failed to perform SAR: %v")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's good security practice not to return any additional information to the user for an authentication response as you can leak clues to the internal architecture of the system, that could lead to CWEs like a response discrepancy.

It's probably a good idea to alter the forbiddenResponse function to not write the error message to the http response and just log it, whilst returning simply 403 forbidden.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll fix in a FUP PR!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the review!

@alexcreasy
Copy link
Contributor

/lgtm

@google-oss-prow google-oss-prow bot added the lgtm label Apr 17, 2025
@alexcreasy
Copy link
Contributor

/approve

Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alexcreasy

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot merged commit 340947a into kubeflow:main Apr 17, 2025
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants