fix(proxy): replace panic with graceful error handling in getRepo by Storm1289 · Pull Request #2419 · kubeflow/hub

Storm1289 · 2026-03-17T14:07:24Z

This PR addresses a reliability issue in cmd/proxy.go where getRepo would panic if it failed to retrieve a repository from the repoSet. Panicking on expected configuration or connection errors can cause the entire proxy server process to crash abruptly in production, leaving little room for graceful degradation or clear logging of the root cause up the initialization chain.

Changes Made

1. Refactored `getRepo` signature

Before: func getRepo[T any](repoSet datastore.RepoSet) T
After: func getRepo[T any](repoSet datastore.RepoSet) (T, error)

2. Error handling instead of panic

Before:

  panic(fmt.Sprintf("unable to get repository: %v", err))

After:

  var zero T
  return zero, fmt.Errorf("unable to get repository: %w", err)

3. Return value updated

Before: return repo.(T)
After: return repo.(T), nil

4. Refactored `newModelRegistryService` — eager repo resolution with error checks

Previously, all 14 getRepo calls were passed inline as arguments directly into core.NewModelRegistryService(...), meaning any failure would panic
mid-call with no recovery path:

  // Before — all 14 repos resolved inline, panic on any failure
  modelRegistryService := core.NewModelRegistryService(
      getRepo[models.ArtifactRepository](repoSet),
      getRepo[models.ModelArtifactRepository](repoSet),
      getRepo[models.DocArtifactRepository](repoSet),
      getRepo[models.RegisteredModelRepository](repoSet),
      getRepo[models.ModelVersionRepository](repoSet),
      getRepo[models.ServingEnvironmentRepository](repoSet),
      getRepo[models.InferenceServiceRepository](repoSet),
      getRepo[models.ServeModelRepository](repoSet),
      getRepo[models.ExperimentRepository](repoSet),
      getRepo[models.ExperimentRunRepository](repoSet),
      getRepo[models.DataSetRepository](repoSet),
      getRepo[models.MetricRepository](repoSet),
      getRepo[models.ParameterRepository](repoSet),
      getRepo[models.MetricHistoryRepository](repoSet),
      repoSet.TypeMap(),
  )

Now each repository is resolved individually with an immediate error check. core.NewModelRegistryService is only called once all 14 repos are confirmed
healthy:

  // After — each repo resolved and checked before proceeding
  dataSetRepo, err := getRepo[models.DataSetRepository](repoSet)
  if err != nil {
      return nil, err
  }
  metricRepo, err := getRepo[models.MetricRepository](repoSet)
  if err != nil {
      return nil, err
  }
  parameterRepo, err := getRepo[models.ParameterRepository](repoSet)
  if err != nil {
      return nil, err
  }
  metricHistoryRepo, err := getRepo[models.MetricHistoryRepository](repoSet)
  if err != nil {
      return nil, err
  }
  // ... (same pattern for all 14 repos)

  modelRegistryService := core.NewModelRegistryService(
      artifactRepo,
      modelArtifactRepo,
      docArtifactRepo,
      registeredModelRepo,
      modelVersionRepo,
      servingEnvRepo,
      inferenceServiceRepo,
      serveModelRepo,
      experimentRepo,
      experimentRunRepo,
      dataSetRepo,
      metricRepo,
      parameterRepo,
      metricHistoryRepo,
      repoSet.TypeMap(),
  )

Why This Matters

Any datastore connection or configuration problem now surfaces as a clean, returnable error instead of a process-killing panic.
Errors are wrapped with %w, making them inspectable via errors.Is / errors.As up the call stack.
The service is only constructed when all dependencies are confirmed available, preventing partial initialization states.

This commit addresses a reliability issue in cmd/proxy.go where getRepo would panic if it failed to retrieve a repository from the repoSet. Panicking on expected configuration or connection errors can cause the entire proxy server process to crash abruptly in production. Changes Made: - Refactored getRepo to return (T, error) instead of panicking. - Refactored newModelRegistryService to extract each of the 14 repositories individually, aggressively checking for errors after each lookup and returning the error back to the caller. Signed-off-by: divakarsharma2934 <divakarsharma2934@gmail.com>

google-oss-prow · 2026-03-17T14:07:34Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign rareddy for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

pboyd · 2026-03-17T14:18:50Z

There's a new policy for AI-generated PRs coming: kubeflow/website#4336 Some of the finer points are still being debated, but there's general consensus that AI-generated code should be marked as such.

If this was AI-generated, can you update the commit message with Co-authored-by: [Agent Name] as requested in the new policy?

Storm1289 · 2026-03-17T14:25:36Z

hi @pboyd
I will look into this

pboyd

I'm not sure this solves a real problem. If getRepo fails we have a programming logic error, which should only happen during development. And in development, the stacktrace is helpful. Whether it's an error or a panic, we can't recover from the error and the service should stop. Personally, I'd prefer the stacktrace, but if we need a friendlier message for some reason, that's fine with me.

Did you encounter this problem in the wild? If so, there's a serious bug that we need to address.

Storm1289 · 2026-03-17T14:39:23Z

You're right this was not encountered in the wild. I identified it during a code review but didn't fully consider that getRepo failing is a programming logic error rather than a runtime issue. The stacktrace from the panic is indeed more useful here. I'll close this PR.
Thank you for the feedback

pboyd · 2026-03-17T15:23:53Z

OK, @divakarsharma2934-a11y, thanks for the PR anyway.

If you'd like to contribute, we have a few "good first issues" (I realized we had run out, but I just tagged a couple more). Also, feel free to reach out on the CNCF slack (#kubeflow-model-registry) if something isn't clear.

google-oss-prow Bot requested review from Al-Pragliola, fege and tarilabs March 17, 2026 14:07

google-oss-prow Bot added the size/M label Mar 17, 2026

github-actions Bot added the Area/Go REST server label Mar 17, 2026

pboyd reviewed Mar 17, 2026

View reviewed changes

Storm1289 closed this Mar 17, 2026

Storm1289 deleted the fix-proxy-getrepo-panic branch March 18, 2026 22:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(proxy): replace panic with graceful error handling in getRepo#2419

fix(proxy): replace panic with graceful error handling in getRepo#2419
Storm1289 wants to merge 1 commit intokubeflow:mainfrom
Storm1289:fix-proxy-getrepo-panic

Storm1289 commented Mar 17, 2026 •

edited

Loading

Uh oh!

google-oss-prow Bot commented Mar 17, 2026

Uh oh!

pboyd commented Mar 17, 2026

Uh oh!

Storm1289 commented Mar 17, 2026

Uh oh!

pboyd left a comment

Uh oh!

Storm1289 commented Mar 17, 2026

Uh oh!

pboyd commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Storm1289 commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes Made

1. Refactored getRepo signature

2. Error handling instead of panic

3. Return value updated

4. Refactored newModelRegistryService — eager repo resolution with error checks

Why This Matters

Uh oh!

google-oss-prow Bot commented Mar 17, 2026

Uh oh!

pboyd commented Mar 17, 2026

Uh oh!

Storm1289 commented Mar 17, 2026

Uh oh!

pboyd left a comment

Choose a reason for hiding this comment

Uh oh!

Storm1289 commented Mar 17, 2026

Uh oh!

pboyd commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Storm1289 commented Mar 17, 2026 •

edited

Loading

1. Refactored `getRepo` signature

4. Refactored `newModelRegistryService` — eager repo resolution with error checks