-
Notifications
You must be signed in to change notification settings - Fork 246
MGMT-19704: Separate pod for monitor #8492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
MGMT-19704: Separate pod for monitor #8492
Conversation
|
@maorfr: This pull request references MGMT-19704 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
WalkthroughAdds an Options.StartMonitors flag to conditionally start background monitors and alter leader-election/migration startup behavior; adds an assisted-monitors Deployment and ServiceMonitor plus new resource parameters and START_MONITORS env wiring in OpenShift templates. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro Cache: Disabled due to data retention organization setting Knowledge base: Disabled due to 📒 Files selected for processing (1)
🧰 Additional context used📓 Path-based instructions (1)**⚙️ CodeRabbit configuration file
Files:
🔇 Additional comments (5)
Comment |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: maorfr The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
9657937 to
91ceb5f
Compare
Currently we run the API server and the background monitors in the same processes and pods. This makes it more difficult to understand the performance characteristics of those two things, and it also means that when the monitors need to be restarted the in-flight API requests need to be aborted. To improve that this moves the monitors to a different pod. This is done trying to minimize code changes: - A new `START_MONITORS` environment variable is added to control if the monitors are started. This is set to `false` in the existing pods pods, and to `true` in the new ones. - The leader election used for the monitors is only started in the new pods, i.e., when `START_MONITORS` is `true`. - The leader election used for applying migrations is only started in the old pods, i.e., when `START_MONITORS` is `false`. The new pods are started by a new `assisted-monitors` deployment that is almost identical to the existing deployment, it even starts the API lister, but it doesn't have an Envoy sidecar and it the API port isn't exposed. In the future the logic of `main.go` should probably be also separated: one for the API server and another for the monitors, but that will require much larger changes that we don't want to do now. Related: https://issues.redhat.com/browse/MGMT-19704 Signed-off-by: Juan Hernandez <[email protected]>
91ceb5f to
22689de
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
cmd/main.go (1)
639-653: Consider addingclusterEventsUploaderto health monitoring.The
clusterEventsUploader(line 634) is started unconditionally but not added tobackgroundThreads, meaning its health is not monitored via the health endpoint. If this thread fails, the failure won't be detected by health checks, potentially affecting observability.If health monitoring is desired for the events uploader, apply this diff:
+ backgroundThreads = append(backgroundThreads, clusterEventsUploader) + if Options.StartMonitors { clusterStateMonitor := thread.New(
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (1)
cmd/main.go(5 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**
⚙️ CodeRabbit configuration file
-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.
Files:
cmd/main.go
🧬 Code graph analysis (1)
cmd/main.go (2)
internal/cluster/cluster.go (1)
Config(147-158)pkg/app/middleware.go (1)
WithHealthMiddleware(39-59)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Red Hat Konflux / assisted-service-rhel9-acm-ds-main-on-pull-request
- GitHub Check: Red Hat Konflux / assisted-service-saas-main-on-pull-request
🔇 Additional comments (2)
cmd/main.go (2)
187-188: LGTM!The
StartMonitorsfield is correctly added with an appropriate default value that maintains backward compatibility.
789-790: Verify health check behavior with emptybackgroundThreads.When
StartMonitors=false(API server pods),backgroundThreadswill be empty. Based on the middleware implementation, an empty thread list means health checks will always return OK without validating any background processes. Confirm this is the intended behavior for separating API and monitor concerns.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #8492 +/- ##
==========================================
- Coverage 43.49% 43.48% -0.02%
==========================================
Files 411 411
Lines 71076 71085 +9
==========================================
- Hits 30917 30913 -4
- Misses 37406 37417 +11
- Partials 2753 2755 +2
🚀 New features to boost your workflow:
|
|
@maorfr: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
testing #7896
Currently we run the API server and the background monitors in the same processes and pods. This makes it more difficult to understand the performance characteristics of those two things, and it also means that when the monitors need to be restarted the in-flight API requests need to be aborted. To improve that this moves the monitors to a different pod. This is done trying to minimize code changes:
A new
START_MONITORSenvironment variable is added to control if the monitors are started. This is set tofalsein the existing pods pods, and totruein the new ones.The leader election used for the monitors is only started in the new pods, i.e., when
START_MONITORSistrue.The leader election used for applying migrations is only started in the old pods, i.e., when
START_MONITORSisfalse.The new pods are started by a new
assisted-monitorsdeployment that is almost identical to the existing deployment, it even starts the API lister, but it doesn't have an Envoy sidecar and it the API port isn't exposed.In the future the logic of
main.goshould probably be also separated: one for the API server and another for the monitors, but that will require much larger changes that we don't want to do now.Related: https://issues.redhat.com/browse/MGMT-19704
List all the issues related to this PR
What environments does this code impact?
How was this code tested?
Checklist
docs, README, etc)Reviewers Checklist