-
Notifications
You must be signed in to change notification settings - Fork 181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ARO-11484] Fix fixetcd GA #4034
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly LGTM, and it was thoughtful to add an E2E test. I made some small suggestions and have one other thing to point out.
There's a part of the user story that I don't see addressed in the PR: I want to add a conditional check for when the node IP address remains the same, and delete the existing etcd Pod if it's in a crashloop
. Is that part of the story still needed?
@kimorris27 Thank you for taking a look!
I don't think it's actually needed. You might be able to reproduce it by running e2e. When you get 200 from the fixetcd api, the etcd pod should be CrashLoopBackoff. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM given the responses to my original comments. The only issue I see now is that it looks like some unit tests in pkg/frontend
need to be updated to reflect the changes to fixetcd.go
, but maybe someone else will need to pick up this work first?
Have you tested this with a dev cluster? |
Please rebase pull request. |
* pkg/env: log MSI data-plane interactions Signed-off-by: Steve Kuznetsov <[email protected]> * clustermsi: send identityIDs, not delegatedResources Signed-off-by: Steve Kuznetsov <[email protected]> * go.mod: bump Azure/msi-dataplane Signed-off-by: Steve Kuznetsov <[email protected]> * *: update to use new Azure/msi-dataplane Signed-off-by: Steve Kuznetsov <[email protected]> * *: use policy.Policy construct for logging Previously, we were substituting the entire http transport just to add some client-side middleware. The Azure SDK for Go ships with default transport parameters that we lost by doing this; furthermore, we were not using the per-call policy options to model our middleware, as the SDK expected. Signed-off-by: Steve Kuznetsov <[email protected]> * clustermsi: be smarter about logging responses Signed-off-by: Steve Kuznetsov <[email protected]> --------- Signed-off-by: Steve Kuznetsov <[email protected]>
Signed-off-by: Steve Kuznetsov <[email protected]>
* Makefile: bump golangci-lint Signed-off-by: Steve Kuznetsov <[email protected]> * test/e2e: fixup bad Ginkgo assertions ginkgo-linter: wrong error assertion. Consider using `Eventually(func(ctx context.Context) error { return project.VerifyProjectIsReady(ctx) }).WithContext(ctx).WithTimeout(DefaultEventuallyTimeout).Should(Succeed())` instead (ginkgolinter) Signed-off-by: Steve Kuznetsov <[email protected]> * pkg/api: remove unnecessary nil checks S1009: should omit nil check; len() for []string is defined as zero (gosimple) Signed-off-by: Steve Kuznetsov <[email protected]> * pkg/util: fix ineffectual compiler directive SA9009: ineffectual compiler directive due to extraneous space: "// go:generate mockgen -destination=../mocks/$GOPACKAGE/client.go github.com/Azure/msi-dataplane/pkg/dataplane Client" (staticcheck) Signed-off-by: Steve Kuznetsov <[email protected]> * *: fix non-constant format strings printf: non-constant format string in call to (*github.com/sirupsen/logrus.Entry).Warnf (govet) Signed-off-by: Steve Kuznetsov <[email protected]> * cmd/migrate: add a program to refactor NewCloudError Signed-off-by: Steve Kuznetsov <[email protected]> * *: run the program to refactor NewCloudError Signed-off-by: Steve Kuznetsov <[email protected]> * pkg/api: refactor NewCloudError to take a static message Signed-off-by: Steve Kuznetsov <[email protected]> * Revert "cmd/migrate: add a program to refactor NewCloudError" This reverts commit 8aa74d1f5d1f006d840e025bf025ce728cced0c4. Signed-off-by: Steve Kuznetsov <[email protected]> * pkg/frontend: fix error handling in fixetcd Signed-off-by: Steve Kuznetsov <[email protected]> --------- Signed-off-by: Steve Kuznetsov <[email protected]>
…r Azure Resources output if cluster uses preconfiguredNSG (#4099) * extended appendAzureNetworkResources() to fetch BYO NSGs * fix existing tests, create BYO NSG tests * fixed expected RG in mock * fixed test * reduced some nested conditions * added validation for a subnet with no NSG
* reuse kv for des if already exists * advice what to do with invalid DES * Inverse the if logic * inverse also if logic to manage only errors * NameAvailable nil is an error * add debug log in cas of retry * type, and replace debug by info * remote code for testing...
…testing locally is important
Which issue this PR addresses:
Fixes https://issues.redhat.com/browse/ARO-11484
What this PR does / why we need it:
This PR fixes fixetcd GA.
Also I added e2e, but it takes so long and it can't run in parallel, so I make it
regression test
.It doesn't run by default in CI.
E2E test is intended to ensure the master replacement SOP is valid because I couldn't reproduce the etcd issue with 100% possiblity.
I think it's enough solution until we find the reliable reproducible scenario.
Test plan for issue:
e2e
Is there any documentation that needs to be updated for this PR?
no
How do you know this will function as expected in production?
e2e, master replacement & fixetcd GA