Add caching with backward compatibility for UserService and Model Registry #178

locnguyen1986 · 2025-09-25T07:06:46Z

Implements caching system using Valkey (primary) with Redis fallback and NoOp for graceful degradation.
Optimizes the performance on:

UserService.FindByPublicID (called on every authenticated request)
Inference Model Registry

jjchen01 · 2025-09-25T07:11:07Z

apps/jan-api-gateway/application/app/domain/cron/cron_service.go

+
+	// Check every 5 minutes instead of every minute
+	ctab.AddJob("*/5 * * * *", func() {
+		cs.CheckInferenceModels(ctx)


9-10 minutes downtime

I may have another logic to handle this cron soon.
but let me rollback to every minute -))

BTW, I used to use a mechanism called "ticker" for health checks.

just check and see Ticker is good to, simple to run, but I think cron is fine too. We can centralize them in one place. Move back every minute scan.

jjchen01 · 2025-09-25T07:19:31Z

.../jan-api-gateway/application/app/domain/inference_model_registry/inference_model_registry.go

+	for id, model1 := range map1 {
+		model2, exists := map2[id]
+		if !exists || model1.Object != model2.Object || model1.OwnedBy != model2.OwnedBy {
+			return false
+		}
+	}


You should compare the maps in the reverse direction:

for id, model2 := range map2 {

Consider renaming the variables for clarity.

(but somehow I feel map1 and map2 are good enough)

I switch to compare the model name with service name only. it's clear.

jjchen01 · 2025-09-25T07:22:43Z

apps/jan-api-gateway/application/config/environment_variables/env.go

+	CACHE_URL      string
+	CACHE_PASSWORD string
+	CACHE_DB       string
+	CACHE_TYPE     string // "valkey" (primary) or "redis" (alternative)


Remove CACHE_TYPE and specify CACHE_REDIS_... or CACHE_VALKEY_...

now we use REDIS_ only

jjchen01 · 2025-09-25T07:30:13Z

apps/jan-api-gateway/application/app/infrastructure/cache/cache_factory.go

+	cacheType := strings.ToLower(environment_variables.EnvironmentVariables.CACHE_TYPE)
+
+	// Default to Valkey if no cache type is specified
+	if cacheType == "" {
+		cacheType = "valkey"
+	}
+
+	switch cacheType {
+	case "redis":
+		return NewRedisCacheService()
+	case "valkey":
+		return NewValkeyCacheService()
+	default:
+		// Fallback to Valkey for unknown types
+		return NewValkeyCacheService()
+	}


Can we use just one specific service for caching, it will soon become a maintenance nightmare.

now we use Redis only

apps/jan-api-gateway/application/app/infrastructure/cache/valkey_cache_service.go

jjchen01 · 2025-09-25T07:39:17Z

apps/jan-api-gateway/application/app/infrastructure/cache/valkey_cache_service.go

+	address, password, db, err := parseValkeyURL(valkeyURL)
+	if err != nil {
+		// Return a no-op implementation for graceful degradation
+		return &NoOpCacheService{}


jjchen01 · 2025-09-25T07:56:37Z

.../jan-api-gateway/application/app/domain/inference_model_registry/inference_model_registry.go

+func (r *InferenceModelRegistry) SetModels(ctx context.Context, serviceName string, models []inferencemodel.Model) {
+	// Check if models have actually changed to avoid unnecessary cache operations
+	if !r.hasModelsChanged(serviceName, models) {
+		return // No changes, skip cache update
+	}
+
+	r.endpointToModels[serviceName] = functional.Map(models, func(model inferencemodel.Model) string {
+		r.modelsDetail[model.ID] = model
+		return model.ID
+	})
+	r.rebuild()
+
+	// Invalidate cache after setting models
+	r.invalidateCache(ctx)
+
+	// Populate cache with new registry data
+	r.populateCache(ctx)
+}


Race condition: if we have n pods, each pod will rebuild the cache when a user adds a new model.
With poor timing, the pods will enter an endless rebuild loop (line 114 + line 117)

Lock is added

apps/jan-api-gateway/application/app/infrastructure/cache/redis_cache_service.go

jjchen01 · 2025-09-25T08:00:40Z

apps/jan-api-gateway/application/app/infrastructure/cache/redis_cache_service.go

+func (r *RedisCacheService) Set(ctx context.Context, key string, value any, expiration time.Duration) error {
+	jsonValue, err := json.Marshal(value)
+	if err != nil {
+		return fmt.Errorf("failed to marshal value: %w", err)
+	}
+
+	return r.client.Set(ctx, key, jsonValue, expiration).Err()
+}


json.Marshal(value) should not be used for the general set function.

You are right.
I am thinking about type check.
but no, just leave arg as string, the function that call will process the conversion

jjchen01 · 2025-09-25T08:04:15Z

.../jan-api-gateway/application/app/domain/inference_model_registry/inference_model_registry.go

+	err := r.cache.GetWithFallback(ctx, cache.RegistryModelEndpointsKey, &modelToEndpoints, func() (any, error) {
+		// Cache miss, return from memory
+		return r.modelToEndpoints, nil
+	}, r.cacheExpiry)


Consider using a sorted set or list for the model cache.
Should we use LRange?

I will introduce new entities for model soon, we may have different provider.

jjchen01 · 2025-09-25T08:47:50Z

.../jan-api-gateway/application/app/domain/inference_model_registry/inference_model_registry.go

+	existingModels, exists := r.endpointToModels[serviceName]
+	if !exists {
+		// Service doesn't exist, so it's a change
+		return len(newModels) > 0
+	}


We should retrieve it from Redis here when comparing model.

The local cache (endpointToModels) is simply syntactic sugar for accessing data.

local variables are removed.

jjchen01 · 2025-09-25T09:34:21Z

apps/jan-api-gateway/application/app/infrastructure/cache/redis_cache_service.go

+}
+
+// NewRedisCacheService creates a new Redis cache service
+func NewRedisCacheService() CacheService {


return CacheService, error

I throw panic :D :D

jjchen01 · 2025-09-25T10:00:54Z

apps/jan-api-gateway/application/app/infrastructure/cache/constants.go

+// Cache TTL constants
+const (
+	// ModelsCacheTTL is the TTL for cached models list
+	ModelsCacheTTL = 10 * time.Minute


If we provide ttl to vllm models,

the expiration will cause downtime for /chat/completions if we get models from cache.

the worst-case downtime is almost 2 minutes.

By the way, can we add back model verification in /chat/completions and return an appropriate response if the model is not provided?

this cache ttl maybe use when cron are failure or in another context. The downtime should be < 2 minutes with cron

The TTL here will definitely introduce downtime. Why do we have to deliberately introduce downtime to handle cases that may not exist?

Sorry, I checked your latest implementation. Basically, you refresh (extend TTL) the cache every minute. It's good to go

:| I see, we have different context here, actually
ModelsCacheTTL -> use for JanInferenceProvider which is not called anywhere in current system, I may use it when combine models later.
and the Cron fetch the models and save into registry, we load models via Registry, not from JanInferenceProvider
but let me make this PR clean, YAGNI, let remove this cache in JanInferenceProvider. If I will need that, I will add it back

…y code

jjchen01 · 2025-09-26T01:42:35Z

.../jan-api-gateway/application/app/domain/inference_model_registry/inference_model_registry.go

+	for _, model := range janModelResp.Data {
+		models = append(models, inferencemodel.Model{
+			ID:      model.ID,
+			Object:  model.Object,
+			Created: model.Created,
+			OwnedBy: model.OwnedBy,
+		})
+	}


janModelResp.Data is []inferencemodel.Model. Why do we need to copy it again?

they will be different soon.
one for model which is respond from inference client
one will be our local model.

jjchen01 · 2025-09-26T01:47:17Z

.../jan-api-gateway/application/app/domain/inference_model_registry/inference_model_registry.go

+	if len(models) > 0 {
+		modelsJSON, _ := json.Marshal(models)
+		r.cache.Set(ctx, cache.ModelsCacheKey, string(modelsJSON), r.cacheExpiry)
+
+		// Store service models mapping
+		serviceCacheKey := cache.RegistryEndpointModelsKey + ":" + sanitizeKeyPart(r.janClient.BaseURL)
+		modelIDs := functional.Map(models, func(model inferencemodel.Model) string {
+			return model.ID
+		})
+		modelIDsJSON, _ := json.Marshal(modelIDs)
+		r.cache.Set(ctx, serviceCacheKey, string(modelIDsJSON), r.cacheExpiry)
+
+		// Build model-to-endpoints mapping
+		modelToEndpoints := make(map[string][]string)
+		for _, model := range models {
+			modelToEndpoints[model.ID] = append(modelToEndpoints[model.ID], r.janClient.BaseURL)
+		}
+		modelToEndpointsJSON, _ := json.Marshal(modelToEndpoints)
+		r.cache.Set(ctx, cache.RegistryModelEndpointsKey, string(modelToEndpointsJSON), r.cacheExpiry)
+	}


If you are allocating multiple resources to cache from a single source of truth, you will need a lock.
Can we change the plan to allocate the resource to a single slot?

The cache structure will be change soon
application -> organization -> project -> user

jjchen01 · 2025-09-26T01:50:06Z

apps/jan-api-gateway/application/app/infrastructure/cache/redis_cache_service.go

+	if err != nil {
+		logger.GetLogger().Error(fmt.Sprintf("Failed to parse Redis URL: %v", err))
+		// Fallback to default configuration
+		opts = &redis.Options{
+			Addr:     "localhost:6379",
+			Password: "",
+			DB:       0,
+		}
+	}


jjchen01 · 2025-09-26T01:50:32Z

apps/jan-api-gateway/application/app/infrastructure/cache/redis_cache_service.go

+	ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
+	defer cancel()
+
+	if err := client.Ping(ctx).Err(); err != nil {


jjchen01 · 2025-09-26T02:00:19Z

apps/jan-api-gateway/application/app/domain/user/service.go

+	var updateErr error
+
+	// Execute with lock using go-redsync
+	err := cache.WithLock(s.cache, lockKey, func() error {


Do not add lock everywhere.
Use optimistic locking here. We just need to cache the user instance and ignore the error.

You can pick one of cache strategy here:

delete cache on update.

update the cache with a single instruction.

jjchen01 · 2025-09-26T02:09:21Z

apps/jan-api-gateway/application/app/domain/user/service.go


 type UserService struct {
 	userrepo UserRepository
+	cache    cache.CacheService


We used to store the cache in the repository, but I feel it's not mandatory.

The user service should not be aware of the user repository(persistance layer) changes.

UserCacheTTL is now stored in the Cache, which may become a large file and violate SPR soon. The cache is similar to an RDB, so we can wrap it with a service/repository.

locnguyen1986 added 8 commits September 24, 2025 17:45

add simple redis solution on models

bf1dd26

Fix redis version

0cd28f8

refactor inference model registry

b101326

support valkey

644eb45

enhance cache call

b1bf679

remove all unused files

c9aecba

remove unused

b0e7f72

improve user service frequency query

b6188d2

jjchen01 reviewed Sep 25, 2025

View reviewed changes

rollback to each minute check

116889f

jjchen01 reviewed Sep 25, 2025

View reviewed changes

locnguyen1986 added 4 commits September 25, 2025 17:01

update to add lock with redsync, also remove cache factory to simplif…

783cf24

…y code

code improve

5dd187d

remove cache from jan inference provider, we don't need it now

92bca1e

Merge branch 'main' into feat/170/add-redis-support-models-loading

7f9b805

jjchen01 approved these changes Sep 26, 2025

View reviewed changes

locnguyen1986 added 3 commits September 26, 2025 10:52

update latest change, remove heavy lock,

5df2192

Merge branch 'main' into feat/170/add-redis-support-models-loading

d19df94

Merge branch 'main' into feat/170/add-redis-support-models-loading

6051739

locnguyen1986 merged commit 55814f9 into main Sep 29, 2025
1 check passed

locnguyen1986 deleted the feat/170/add-redis-support-models-loading branch September 29, 2025 02:11

Add caching with backward compatibility for UserService and Model Registry #178

Add caching with backward compatibility for UserService and Model Registry #178

Uh oh!

Conversation

locnguyen1986 commented Sep 25, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!