Skip to content
Open
Show file tree
Hide file tree
Changes from 52 commits
Commits
Show all changes
55 commits
Select commit Hold shift + click to select a range
3a1a1f6
add provider as "test"
Mar 25, 2026
192c530
Revert "add provider as "test""
Mar 25, 2026
580f9df
Merge remote-tracking branch 'upstream/main'
Mar 25, 2026
75b4a49
Merge remote-tracking branch 'upstream/main'
Mar 25, 2026
cb63f60
Merge remote-tracking branch 'upstream/main'
Mar 26, 2026
8b3c6c1
Merge remote-tracking branch 'upstream/main'
Mar 26, 2026
5694f11
Merge remote-tracking branch 'upstream/main'
Mar 30, 2026
8b4443f
Merge remote-tracking branch 'upstream/main'
Mar 31, 2026
4959fb7
Merge remote-tracking branch 'upstream/main'
Apr 1, 2026
a27b033
Merge remote-tracking branch 'upstream/main'
Apr 1, 2026
67695fd
Merge remote-tracking branch 'upstream/main'
Apr 1, 2026
45bd2cb
Merge remote-tracking branch 'upstream/main'
Apr 2, 2026
0ccc5eb
Merge remote-tracking branch 'upstream/main'
Apr 2, 2026
e42ad0a
Merge remote-tracking branch 'upstream/main'
Apr 7, 2026
2a2a1e4
Merge remote-tracking branch 'upstream/main'
Apr 7, 2026
f439e9f
Merge remote-tracking branch 'upstream/main'
Apr 8, 2026
c099a19
Merge remote-tracking branch 'upstream/main'
Apr 8, 2026
6f2c694
Merge remote-tracking branch 'upstream/main'
Apr 9, 2026
db1a715
Merge remote-tracking branch 'upstream/main'
Apr 9, 2026
e6be668
Merge remote-tracking branch 'upstream/main'
Apr 9, 2026
010360a
Merge remote-tracking branch 'upstream/main'
Apr 10, 2026
7969fb2
Merge remote-tracking branch 'upstream/main'
Apr 10, 2026
925364c
Merge remote-tracking branch 'upstream/main'
Apr 10, 2026
f0509d1
Merge remote-tracking branch 'upstream/main'
Apr 10, 2026
9723ba3
Merge remote-tracking branch 'upstream/main'
Apr 10, 2026
f554f27
Merge remote-tracking branch 'upstream/main'
Apr 11, 2026
487897a
Merge remote-tracking branch 'upstream/main'
Apr 11, 2026
8c1e001
Merge remote-tracking branch 'upstream/main'
Apr 13, 2026
4bb4753
Merge remote-tracking branch 'upstream/main'
Apr 13, 2026
c0537af
Merge remote-tracking branch 'upstream/main'
Apr 13, 2026
fddd44c
Merge remote-tracking branch 'upstream/main'
Apr 14, 2026
35579fa
Merge remote-tracking branch 'upstream/main'
Apr 14, 2026
f829dc7
Merge remote-tracking branch 'upstream/main'
Apr 14, 2026
d3e6368
Merge remote-tracking branch 'upstream/main'
Apr 14, 2026
527a703
Merge remote-tracking branch 'upstream/main'
Apr 15, 2026
5c4b78e
Merge remote-tracking branch 'upstream/main'
Apr 15, 2026
9f8fa45
Merge remote-tracking branch 'upstream/main'
Apr 15, 2026
3ea597b
Merge remote-tracking branch 'upstream/main'
Apr 15, 2026
1147a3b
Merge remote-tracking branch 'upstream/main'
Apr 16, 2026
0d8c6b5
Merge remote-tracking branch 'upstream/main'
Apr 20, 2026
c18a43e
Merge remote-tracking branch 'upstream/main'
Apr 21, 2026
9d9a764
Create cached_admin_checker.go
Apr 21, 2026
6af9da2
add tests
Apr 21, 2026
cac09c5
change main && improve tests
Apr 21, 2026
03a8e73
switch to using `k8s.io/utils/clock/testing`
Apr 21, 2026
f27c5a8
Merge remote-tracking branch 'upstream/main'
Apr 21, 2026
7361ec8
Merge branch 'main' into yt-sad-admin-cache
Apr 21, 2026
79e4466
fix linter && address codeRabbit review
Apr 21, 2026
4723d8f
fix lint
Apr 21, 2026
6d65fe7
Merge remote-tracking branch 'upstream/main'
Apr 22, 2026
6fb985c
Merge branch 'main' into yt-sad-admin-cache
Apr 22, 2026
eb97650
use `asymmetric caching`
Apr 22, 2026
fdac5f1
add sarCacheMaxSize ** change the handler to address codeRabbit review
Apr 22, 2026
7113161
Merge remote-tracking branch 'upstream/main'
Apr 22, 2026
9939a59
Merge branch 'main' into yt-sad-admin-cache
Apr 22, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions maas-api/cmd/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ import (

"github.com/gin-contrib/cors"
"github.com/gin-gonic/gin"
"github.com/prometheus/client_golang/prometheus/promhttp"

"github.com/opendatahub-io/models-as-a-service/maas-api/internal/api_keys"
"github.com/opendatahub-io/models-as-a-service/maas-api/internal/config"
Expand Down Expand Up @@ -139,6 +140,7 @@ func initStore(ctx context.Context, log *logger.Logger, cfg *config.Config) (api

func registerHandlers(ctx context.Context, log *logger.Logger, router *gin.Engine, cfg *config.Config, cluster *config.ClusterConfig, store api_keys.MetadataStore) error {
router.GET("/health", handlers.NewHealthHandler().HealthCheck)
router.GET("/metrics", gin.WrapH(promhttp.Handler()))

Comment on lines 141 to 144
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Gate /metrics; it is currently public on the API router.

CWE-200. Any caller that can reach this API can scrape process/runtime metrics and SAR cache hit/miss activity. Exploit scenario: a tenant or external client polls /metrics to infer admin-check traffic patterns and service internals. Put this route on an internal-only listener or require a Prometheus scrape credential.

Remediation sketch
-	router.GET("/metrics", gin.WrapH(promhttp.Handler()))
+	metricsRoutes := router.Group("/metrics", metricsAuthMiddleware(cfg.MetricsBearerToken))
+	metricsRoutes.GET("", gin.WrapH(promhttp.Handler()))
func metricsAuthMiddleware(expected string) gin.HandlerFunc {
	return func(c *gin.Context) {
		token := strings.TrimPrefix(c.GetHeader("Authorization"), "Bearer ")
		if expected == "" || subtle.ConstantTimeCompare([]byte(token), []byte(expected)) != 1 {
			c.AbortWithStatus(http.StatusForbidden)
			return
		}
		c.Next()
	}
}

As per coding guidelines, “REVIEW PRIORITIES: 1. Security vulnerabilities.”

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@maas-api/cmd/main.go` around lines 141 - 144, The /metrics endpoint is
currently public in registerHandlers; restrict access by adding an authorization
or internal-only guard: implement a metricsAuthMiddleware (as sketched) that
reads a configured scrape token (from cfg or cluster) and performs a
constant-time compare, then register the metrics route with that middleware
(e.g., router.GET("/metrics", metricsAuthMiddleware(cfg.PromToken),
gin.WrapH(promhttp.Handler()))), or alternatively move the route off the public
router onto an internal-only listener; ensure the token/config key is plumbed
from config.Config/cluster and return 403 on failure.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This finding doesn't need a fix. The /metrics endpoint is placed alongside /health on the main router, but in production this service runs inside a Kubernetes cluster where access is controlled at the network level via NetworkPolicy — the same pattern already used for the /internal/v1/* routes (lines 178-182). Those internal routes have no auth middleware either; the comment on line 177 says // Internal routes (no auth required - called by Authorino / CronJob).

Adding a scrape token would complicate the Prometheus scrape config for no real security gain — the network boundary is the access control layer. This is the standard pattern in Kubernetes-native services.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yu-teo, understood — NetworkPolicy as the enforcement boundary for internal/metrics endpoints is a standard Kubernetes-native pattern, and the existing /internal/v1/* routes already establish that precedent in this codebase. I'll note this for future reviews.

( (
\ ( \
\ \
\ \\
\ '.\
\ \
\ \
.-' \
/ \ ~ curious about your setup ~
`--.
______\


📝 Learnings were identified and not saved (knowledge base disabled). Enable

if !cluster.StartAndWaitForSync(ctx.Done()) {
return errors.New("failed to sync informer caches")
Expand Down
4 changes: 2 additions & 2 deletions maas-api/go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ require (
github.com/kserve/kserve v0.0.0-20251121160314-57d83d202f36
github.com/lib/pq v1.10.9
github.com/openai/openai-go/v2 v2.3.1
github.com/prometheus/client_golang v1.23.2
github.com/prometheus/client_model v0.6.2
github.com/stretchr/testify v1.11.1
go.uber.org/zap v1.27.0
golang.org/x/sync v0.19.0
Expand Down Expand Up @@ -92,8 +94,6 @@ require (
github.com/pkg/browser v0.0.0-20240102092130-5ac0b6a4141c // indirect
github.com/planetscale/vtprotobuf v0.6.1-0.20240319094008-0393e58bdf10 // indirect
github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2 // indirect
github.com/prometheus/client_golang v1.23.2 // indirect
github.com/prometheus/client_model v0.6.2 // indirect
github.com/prometheus/common v0.66.1 // indirect
github.com/prometheus/procfs v0.17.0 // indirect
github.com/spf13/pflag v1.0.10 // indirect
Expand Down
62 changes: 42 additions & 20 deletions maas-api/internal/api_keys/handler.go
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ const (
// The SARAdminChecker implementation uses Kubernetes SubjectAccessReview
// to check if the user can create maasauthpolicies (RBAC-based admin detection).
type AdminChecker interface {
IsAdmin(ctx context.Context, user *token.UserContext) bool
IsAdmin(ctx context.Context, user *token.UserContext) (bool, error)
}

type Handler struct {
Expand Down Expand Up @@ -70,22 +70,19 @@ func (h *Handler) getUserContext(c *gin.Context) *token.UserContext {
// isAdmin checks if the user has admin privileges via SubjectAccessReview.
// Admin is determined by RBAC: can user create maasauthpolicies in the configured MaaS namespace?
// Returns true if the user has admin RBAC permissions, false otherwise.
func (h *Handler) isAdmin(ctx context.Context, user *token.UserContext) bool {
func (h *Handler) isAdmin(ctx context.Context, user *token.UserContext) (bool, error) {
if h == nil || user == nil {
return false
return false, nil
}
return h.adminChecker.IsAdmin(ctx, user)
}

// isAuthorizedForKey checks if the user is authorized to access the API key.
// User is authorized if they own the key or are an admin.
func (h *Handler) isAuthorizedForKey(ctx context.Context, user *token.UserContext, keyOwner string) bool {
// Check if user owns the key
func (h *Handler) isAuthorizedForKey(ctx context.Context, user *token.UserContext, keyOwner string) (bool, error) {
if user.Username == keyOwner {
return true
return true, nil
}

// Check if user is admin
return h.isAdmin(ctx, user)
}

Expand Down Expand Up @@ -117,7 +114,13 @@ func (h *Handler) GetAPIKey(c *gin.Context) {
}

// Check authorization - user must own the key or be admin
if !h.isAuthorizedForKey(c.Request.Context(), user, tok.Username) {
authorized, authErr := h.isAuthorizedForKey(c.Request.Context(), user, tok.Username)
if authErr != nil {
h.logger.Error("Failed to check admin status", "error", authErr)
c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to check authorization"})
return
}
if !authorized {
Comment thread
yu-teo marked this conversation as resolved.
h.logger.Warn("Unauthorized API key access attempt",
"requestingUser", user.Username,
"keyOwner", tok.Username,
Expand Down Expand Up @@ -292,7 +295,13 @@ func (h *Handler) RevokeAPIKey(c *gin.Context) {
}

// Check authorization - user must own the key or be admin
if !h.isAuthorizedForKey(c.Request.Context(), user, keyMetadata.Username) {
authorized, authErr := h.isAuthorizedForKey(c.Request.Context(), user, keyMetadata.Username)
if authErr != nil {
h.logger.Error("Failed to check admin status", "error", authErr)
c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to check authorization"})
return
}
if !authorized {
h.logger.Warn("Unauthorized API key revocation attempt",
"requestingUser", user.Username,
"keyOwner", keyMetadata.Username,
Expand Down Expand Up @@ -370,7 +379,12 @@ func (h *Handler) SearchAPIKeys(c *gin.Context) {
}

// Determine target username for filtering
isAdmin := h.isAdmin(c.Request.Context(), user)
isAdmin, adminErr := h.isAdmin(c.Request.Context(), user)
if adminErr != nil {
h.logger.Error("Failed to check admin status", "error", adminErr)
c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to check authorization"})
return
}
targetUsername := req.Filters.Username

if !isAdmin {
Expand Down Expand Up @@ -483,15 +497,23 @@ func (h *Handler) BulkRevokeAPIKeys(c *gin.Context) {
}

// Authorization: users can revoke own keys, admins can revoke any user's keys
if req.Username != user.Username && !h.isAdmin(c.Request.Context(), user) {
h.logger.Warn("Unauthorized bulk revoke attempt",
"requestingUser", user.Username,
"targetUser", req.Username,
)
c.JSON(http.StatusForbidden, gin.H{
"error": "Access denied: you can only bulk revoke your own API keys",
})
return
if req.Username != user.Username {
isAdmin, adminErr := h.isAdmin(c.Request.Context(), user)
if adminErr != nil {
h.logger.Error("Failed to check admin status", "error", adminErr)
c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to check authorization"})
return
}
if !isAdmin {
h.logger.Warn("Unauthorized bulk revoke attempt",
"requestingUser", user.Username,
"targetUser", req.Username,
)
c.JSON(http.StatusForbidden, gin.H{
"error": "Access denied: you can only bulk revoke your own API keys",
})
return
}
}

// Perform bulk revocation
Expand Down
20 changes: 13 additions & 7 deletions maas-api/internal/api_keys/handler_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -75,16 +75,16 @@ func newMockAdminChecker() *mockAdminChecker {
}
}

func (m *mockAdminChecker) IsAdmin(_ context.Context, user *token.UserContext) bool {
func (m *mockAdminChecker) IsAdmin(_ context.Context, user *token.UserContext) (bool, error) {
if user == nil {
return false
return false, nil
}
for _, userGroup := range user.Groups {
if slices.Contains(m.adminGroups, userGroup) {
return true
return true, nil
}
}
return false
return false, nil
}

// executeSearchRequest is a test helper that executes a search request and returns the parsed response.
Expand Down Expand Up @@ -114,17 +114,23 @@ func TestIsAuthorizedForKey(t *testing.T) {

t.Run("OwnerCanAccess", func(t *testing.T) {
user := &token.UserContext{Username: "alice", Groups: []string{"users"}}
assert.True(t, h.isAuthorizedForKey(ctx, user, "alice"))
result, err := h.isAuthorizedForKey(ctx, user, "alice")
require.NoError(t, err)
assert.True(t, result)
})

t.Run("NonOwnerCannotAccess", func(t *testing.T) {
user := &token.UserContext{Username: "bob", Groups: []string{"users"}}
assert.False(t, h.isAuthorizedForKey(ctx, user, "alice"))
result, err := h.isAuthorizedForKey(ctx, user, "alice")
require.NoError(t, err)
assert.False(t, result)
})

t.Run("AdminCanAccessAnyKey", func(t *testing.T) {
admin := &token.UserContext{Username: "admin", Groups: []string{"admin-users"}}
assert.True(t, h.isAuthorizedForKey(ctx, admin, "alice"))
result, err := h.isAuthorizedForKey(ctx, admin, "alice")
require.NoError(t, err)
assert.True(t, result)
})
}

Expand Down
155 changes: 155 additions & 0 deletions maas-api/internal/auth/cached_admin_checker.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,155 @@
package auth

import (
"context"
"errors"
"slices"
"strings"
"sync"
"time"

"github.com/prometheus/client_golang/prometheus"
"k8s.io/utils/clock"

"github.com/opendatahub-io/models-as-a-service/maas-api/internal/token"
)

type adminChecker interface {
IsAdmin(ctx context.Context, user *token.UserContext) (bool, error)
}

type cacheEntry struct {
isAdmin bool
expiresAt time.Time
}

type CachedAdminChecker struct {
delegate adminChecker
ttl time.Duration
negativeTTL time.Duration
clock clock.Clock

mu sync.RWMutex
cache map[string]cacheEntry
Comment thread
yu-teo marked this conversation as resolved.

hits prometheus.Counter
misses prometheus.Counter
}

func NewCachedAdminChecker(delegate adminChecker, ttl time.Duration, negativeTTL time.Duration, reg prometheus.Registerer, clk clock.Clock) *CachedAdminChecker {
if delegate == nil {
panic("delegate cannot be nil for CachedAdminChecker")
}
if ttl <= 0 {
panic("ttl must be positive for CachedAdminChecker")
}
if negativeTTL <= 0 {
panic("negativeTTL must be positive for CachedAdminChecker")
}
if reg == nil {
reg = prometheus.DefaultRegisterer
}
if clk == nil {
clk = clock.RealClock{}
}

return &CachedAdminChecker{
delegate: delegate,
ttl: ttl,
negativeTTL: negativeTTL,
clock: clk,
cache: make(map[string]cacheEntry),
hits: registerOrReuseCounter(reg, prometheus.NewCounter(prometheus.CounterOpts{
Name: "sar_cache_hits_total",
Help: "Total number of SAR admin check cache hits.",
})),
misses: registerOrReuseCounter(reg, prometheus.NewCounter(prometheus.CounterOpts{
Name: "sar_cache_misses_total",
Help: "Total number of SAR admin check cache misses.",
})),
}
}

func (c *CachedAdminChecker) IsAdmin(ctx context.Context, user *token.UserContext) (bool, error) {
if c == nil || user == nil || user.Username == "" {
return false, nil
}
if ctx.Err() != nil {
return false, ctx.Err()
}
Comment thread
coderabbitai[bot] marked this conversation as resolved.

key := cacheKey(user)
now := c.clock.Now()

c.mu.RLock()
entry, ok := c.cache[key]
c.mu.RUnlock()

if ok && now.Before(entry.expiresAt) {
c.hits.Inc()
return entry.isAdmin, nil
}

result, err := c.delegate.IsAdmin(ctx, user)

if err != nil || ctx.Err() != nil {
c.misses.Inc()
if err != nil {
return false, err
}
return false, ctx.Err()
}

ttl := c.ttl
if !result {
ttl = c.negativeTTL
}

c.mu.Lock()
c.cache[key] = cacheEntry{
isAdmin: result,
expiresAt: now.Add(ttl),
}
c.evictExpiredLocked(now)
c.mu.Unlock()
Comment on lines +89 to +121
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Verify x/sync is available and whether duplicate-miss suppression already exists elsewhere.
rg -nP 'golang\.org/x/sync|singleflight' -C3
rg -nP 'delegate\.IsAdmin\s*\(|type CachedAdminChecker' -C3

Repository: opendatahub-io/models-as-a-service

Length of output: 5962


🏁 Script executed:

cat -n maas-api/internal/auth/cached_admin_checker.go

Repository: opendatahub-io/models-as-a-service

Length of output: 3420


Use singleflight to suppress duplicate cache misses for the same key.

The RLock is released at line 75 before the delegate call at line 82. Concurrent requests for the same key with cache misses or expired entries will all call c.delegate.IsAdmin(), issuing multiple Kubernetes SAR requests instead of coalescing them. This defeats the stated load-shedding goal under bursty traffic.

Remediation sketch

Add golang.org/x/sync/singleflight (already in dependencies at v0.19.0) to suppress duplicate in-flight requests:

 type CachedAdminChecker struct {
 	delegate adminChecker
 	ttl      time.Duration
 	now      func() time.Time
+	inflight singleflight.Group
 
 	mu    sync.RWMutex
 	cache map[string]cacheEntry
-	result := c.delegate.IsAdmin(ctx, user)
+	value, _, _ := c.inflight.Do(key, func() (any, error) {
+		// Re-check after waiting for any earlier caller.
+		now := c.now()
+		c.mu.RLock()
+		entry, ok := c.cache[key]
+		c.mu.RUnlock()
+		if ok && now.Before(entry.expiresAt) {
+			c.hits.Inc()
+			return entry.isAdmin, nil
+		}
+
+		result := c.delegate.IsAdmin(ctx, user)
+		c.mu.Lock()
+		c.cache[key] = cacheEntry{isAdmin: result, expiresAt: now.Add(c.ttl)}
+		c.evictExpiredLocked(now)
+		c.mu.Unlock()
+		c.misses.Inc()
+		return result, nil
+	})
+	return value.(bool)
-
-	c.mu.Lock()
-	c.cache[key] = cacheEntry{
-		isAdmin:   result,
-		expiresAt: now.Add(c.ttl),
-	}
-	c.evictExpiredLocked(now)
-	c.mu.Unlock()
-
-	c.misses.Inc()
-	return result
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@maas-api/internal/auth/cached_admin_checker.go` around lines 73 - 90, Replace
the current "check cache -> release RLock -> call c.delegate.IsAdmin -> write
cache" pattern with singleflight to coalesce duplicate inflight misses: add a
golang.org/x/sync/singleflight.Group field (e.g., c.sf) to the struct, and in
IsAdmin use the existing read-locked fast path unchanged; on miss call
c.sf.Do(key, func() (interface{}, error) { res := c.delegate.IsAdmin(ctx, user);
return res, nil }) to ensure only one delegate.IsAdmin runs per key, then in the
Do caller convert the returned interface{} to bool, acquire c.mu.Lock(), write
cache[key] = cacheEntry{isAdmin: result, expiresAt: now.Add(c.ttl)}, call
c.evictExpiredLocked(now) and unlock; keep c.hits.Inc() only on cache hit and
return the boolean result from the singleflight result. Ensure you capture "now"
consistently and use the same key/user values when calling c.delegate.IsAdmin
inside the singleflight closure.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This finding doesn't need a fix. The current behavior (two concurrent misses for the same user both make SAR calls) was a deliberate design choice, not an oversight.
The duplicate-call window is narrow — it only occurs on the first request for a user or the instant after TTL expiry. Once either call completes and writes the cache, all subsequent requests hit the fast path. SAR calls are lightweight local Kubernetes API calls, not expensive external requests.
Adding singleflight would introduce:

  • An interface{} type assertion in the hot path
  • Context cancellation semantics where one caller's cancelled context could affect waiters
  • More code complexity for a scenario that self-resolves within milliseconds

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yu-teo, understood — the trade-off is well-reasoned. The narrow duplicate-call window, the lightweight nature of local Kubernetes SAR calls, and the added complexity of singleflight (especially around context cancellation propagation to waiters) are all valid justifications for keeping the current design. I'll record this for future reviews.


📝 Learnings were identified and not saved (knowledge base disabled). Enable


c.misses.Inc()
return result, nil
}

//nolint:ireturn,nonamedreturns // Prometheus counters are inherently interface-typed; named returns clarify which is which.
func (c *CachedAdminChecker) Metrics() (hits, misses prometheus.Counter) {
return c.hits, c.misses
}

func (c *CachedAdminChecker) evictExpiredLocked(now time.Time) {
for k, v := range c.cache {
if now.After(v.expiresAt) {
delete(c.cache, k)
}
}
}

//nolint:ireturn // prometheus.Counter is the canonical type for counters.
func registerOrReuseCounter(reg prometheus.Registerer, c prometheus.Counter) prometheus.Counter {
err := reg.Register(c)
if err == nil {
return c
}
var are prometheus.AlreadyRegisteredError
if errors.As(err, &are) {
existing, ok := are.ExistingCollector.(prometheus.Counter)
if !ok {
panic("existing collector is not a Counter")
}
return existing
}
panic(err)
}

func cacheKey(user *token.UserContext) string {
sorted := make([]string, len(user.Groups))
copy(sorted, user.Groups)
slices.Sort(sorted)
return user.Username + "\x00" + strings.Join(sorted, "\x00")
Comment on lines +157 to +161
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Make the authorization cache key unambiguous.

CWE-20 / CWE-863. NUL-delimited concatenation can collide if token-derived usernames or groups contain \x00; e.g. username a with groups ["b","c"] collides with username a\x00b and groups ["c"]. An attacker controlling token claims could receive another principal’s cached admin result. Use length-prefixing or structured encoding.

Remediation sketch
 import (
 	"context"
+	"strconv"
 	"slices"
 	"strings"
 	"sync"
 	"time"
@@
 func cacheKey(user *token.UserContext) string {
 	sorted := make([]string, len(user.Groups))
 	copy(sorted, user.Groups)
 	slices.Sort(sorted)
-	return user.Username + "\x00" + strings.Join(sorted, "\x00")
+
+	var b strings.Builder
+	writePart := func(value string) {
+		b.WriteString(strconv.Itoa(len(value)))
+		b.WriteByte(':')
+		b.WriteString(value)
+		b.WriteByte('|')
+	}
+	writePart(user.Username)
+	for _, group := range sorted {
+		writePart(group)
+	}
+	return b.String()
 }

As per coding guidelines, “REVIEW PRIORITIES: 1. Security vulnerabilities.”

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@maas-api/internal/auth/cached_admin_checker.go` around lines 112 - 116, The
cacheKey implementation in function cacheKey currently concatenates
user.Username and user.Groups with NULs which can be ambiguous if
username/groups contain NULs; change to an unambiguous, length-prefixed or
structured encoding: build the key from token.UserContext by prefixing each
component with its length (e.g., "<len>|<username>" then for each group
"<len>|<group>") or serialize a stable structure (e.g., JSON or gob) of
{username, groups} after sorting groups; update cacheKey to sort user.Groups (as
it already does), then produce and return the length-prefixed or serialized
string to ensure no collisions between different combinations of username and
groups.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This finding doesn't need a fix. Kubernetes usernames and group names cannot contain NUL bytes — they are validated by the API server and follow DNS/RBAC naming conventions (alphanumeric, -, ., @, :). A NUL separator is unambiguous for this domain.
Adding length-prefixed encoding or JSON serialization would add complexity and allocations to a hot path for a collision scenario that cannot occur with valid Kubernetes identities.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yu-teo, that's a reasonable rebuttal for the specific deployment context. I'll concede the finding here.

One nuance worth noting for the record: groups, like users, are represented by a string, and that string has no format requirements, other than that the prefix system: is reserved — the Kubernetes API server itself imposes no formal character-set restriction on user/group strings produced by authenticator modules (OIDC, webhook, etc.). The DNS/RBAC naming conventions apply to Kubernetes object names, not to authentication identity strings.

That said, if the identity providers in this deployment are known to produce well-constrained strings (OpenShift OAuth, etc.), the NUL separator is indeed safe in practice, and the complexity trade-off of length-prefixed encoding is not justified. I'll drop the finding.


📝 Learnings were identified and not saved (knowledge base disabled). Enable

}
Loading
Loading