fix(proxy): replace panic with graceful error handling in getRepo#2419
fix(proxy): replace panic with graceful error handling in getRepo#2419Storm1289 wants to merge 1 commit intokubeflow:mainfrom
Conversation
This commit addresses a reliability issue in cmd/proxy.go where getRepo would panic if it failed to retrieve a repository from the repoSet. Panicking on expected configuration or connection errors can cause the entire proxy server process to crash abruptly in production. Changes Made: - Refactored getRepo to return (T, error) instead of panicking. - Refactored newModelRegistryService to extract each of the 14 repositories individually, aggressively checking for errors after each lookup and returning the error back to the caller. Signed-off-by: divakarsharma2934 <divakarsharma2934@gmail.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
There's a new policy for AI-generated PRs coming: kubeflow/website#4336 Some of the finer points are still being debated, but there's general consensus that AI-generated code should be marked as such. If this was AI-generated, can you update the commit message with |
|
hi @pboyd |
pboyd
left a comment
There was a problem hiding this comment.
I'm not sure this solves a real problem. If getRepo fails we have a programming logic error, which should only happen during development. And in development, the stacktrace is helpful. Whether it's an error or a panic, we can't recover from the error and the service should stop. Personally, I'd prefer the stacktrace, but if we need a friendlier message for some reason, that's fine with me.
Did you encounter this problem in the wild? If so, there's a serious bug that we need to address.
|
You're right this was not encountered in the wild. I identified it during a code review but didn't fully consider that getRepo failing is a programming logic error rather than a runtime issue. The stacktrace from the panic is indeed more useful here. I'll close this PR. |
|
OK, @divakarsharma2934-a11y, thanks for the PR anyway. If you'd like to contribute, we have a few "good first issues" (I realized we had run out, but I just tagged a couple more). Also, feel free to reach out on the CNCF slack (#kubeflow-model-registry) if something isn't clear. |
This PR addresses a reliability issue in cmd/proxy.go where
getRepowouldpanicif it failed to retrieve a repository from therepoSet. Panicking on expected configuration or connection errors can cause the entire proxy server process to crash abruptly in production, leaving little room for graceful degradation or clear logging of the root cause up the initialization chain.Changes Made
1. Refactored
getReposignaturefunc getRepo[T any](repoSet datastore.RepoSet) Tfunc getRepo[T any](repoSet datastore.RepoSet) (T, error)2. Error handling instead of panic
3. Return value updated
return repo.(T)return repo.(T), nil4. Refactored
newModelRegistryService— eager repo resolution with error checksPreviously, all 14
getRepocalls were passed inline as arguments directly intocore.NewModelRegistryService(...), meaning any failure would panicmid-call with no recovery path:
Now each repository is resolved individually with an immediate error check.
core.NewModelRegistryServiceis only called once all 14 repos are confirmedhealthy:
Why This Matters
%w, making them inspectable viaerrors.Is/errors.Asup the call stack.