Handle nsx-proxy 403 response during manager recovery#1377
Open
zhengxiexie wants to merge 1 commit intovmware-tanzu:mainfrom
Open
Handle nsx-proxy 403 response during manager recovery#1377zhengxiexie wants to merge 1 commit intovmware-tanzu:mainfrom
zhengxiexie wants to merge 1 commit intovmware-tanzu:mainfrom
Conversation
154027a to
cb299f4
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1377 +/- ##
==========================================
+ Coverage 76.65% 76.66% +0.01%
==========================================
Files 151 151
Lines 21102 21120 +18
==========================================
+ Hits 16175 16191 +16
- Misses 3773 3774 +1
- Partials 1154 1155 +1
🚀 New features to boost your workflow:
|
When NSX manager is recovering, nsx-proxy may return 403 Forbidden with an HTML body instead of JSON. Previously this was treated as a generic manager error, causing the operator to keep retrying the same endpoint. Add NsxProxyForbiddenError as a ground trigger so the endpoint is marked DOWN and failover to a healthy endpoint occurs. Also prefer VIP endpoint in selectEndpoint and only fallback to individual managers when VIP is DOWN. Bug: 3638737 Change-Id: I6007fa3bc80acdb9fa13d317c616b78b3d833fa8
cb299f4 to
d3c32e1
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
✨ What's Changed
NSX Proxy 403 Forbidden Handling
NsxProxyForbiddenErrortype for detecting nsx-proxy HTML 403 responses during manager recoveryNsxProxyForbiddenErroras a ground trigger to enable endpoint failoverInitErrorFromResponseand return the new error typetruncateBodyForLogginghelper for safe body preview in logsVIP Endpoint Priority
endpoints[0]) inselectEndpointwhen it is not DOWNImplementation Details
NsxProxyForbiddenErrorstruct embeddingmanagerErrorImplinerrors.goCreateNsxProxyForbiddenError(host, bodyPreview)constructorgroundTriggersto include"NsxProxyForbiddenError"alongsideConnectionErrorandTimeoutInitErrorFromResponse, when JSON parsing fails for a 403 response with non-empty body, returnNsxProxyForbiddenErrorinstead ofGeneralManagerErrorselectEndpoint, check VIP first; iterate onlyendpoints[1:]for fallback selection🎯 Motivation
When NSX manager is recovering, nsx-proxy may return
403 Forbiddenwith an HTML body instead of JSON. The operator'sInitErrorFromResponsetreated this as aGeneralManagerError, which is not a ground trigger — so the operator kept retrying the same unhealthy endpoint without failover.Issue: VPC cleanup hangs during supervisor disable because the operator is stuck on the recovering manager endpoint.
Solution:
NsxProxyForbiddenError(ground trigger) → endpoint marked DOWN → failoverselectEndpointso traffic routes through VIP when available, falling back to individual managers only when VIP is DOWN✅ Testing
Unit Tests
make test✅)TestCreateFunc- ValidatesCreateNsxProxyForbiddenErrorconstructor via reflectionTestShouldGroundPoint- ConfirmsNsxProxyForbiddenErrortriggers endpoint failoverTestInitErrorFromResponse_403NonJSON- 5 test cases:NsxProxyForbiddenError(ground trigger ✅, not retriable ✅)NsxProxyForbiddenErrorBadXSRFToken(existing behavior preserved)nil(no error)NsxProxyForbiddenError(only 403 triggers)TestTruncateBodyForLogging- Short, exact-length, and long body truncation🔄 Backward Compatibility
This change is fully backward compatible:
BadXSRFToken)GeneralManagerError, nowNsxProxyForbiddenError)selectEndpointbehavior is unchanged when only one endpoint exists; VIP priority only applies to multi-endpoint configurations