-
Notifications
You must be signed in to change notification settings - Fork 799
Avoid log spam about cluster node failure detection by each primary #2010
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: unstable
Are you sure you want to change the base?
Conversation
Signed-off-by: Harkrishn Patro <[email protected]>
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## unstable #2010 +/- ##
============================================
- Coverage 71.01% 71.01% -0.01%
============================================
Files 123 123
Lines 66033 66113 +80
============================================
+ Hits 46892 46948 +56
- Misses 19141 19165 +24
🚀 New features to boost your workflow:
|
@@ -2409,13 +2409,13 @@ void clusterProcessGossipSection(clusterMsg *hdr, clusterLink *link) { | |||
if (sender) { | |||
if (flags & (CLUSTER_NODE_FAIL | CLUSTER_NODE_PFAIL)) { | |||
if (clusterNodeIsVotingPrimary(sender) && clusterNodeAddFailureReport(node, sender)) { | |||
serverLog(LL_NOTICE, "Node %.40s (%s) reported node %.40s (%s) as not reachable.", sender->name, | |||
serverLog(LL_VERBOSE, "Node %.40s (%s) reported node %.40s (%s) as not reachable.", sender->name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we make the level Warning? I am wondering if users depend on this log for any debugging.
Also, just out of curiosity, does changing log severity come under breaking change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be helpful in case of small clusters but unsure how valuable it is to log it for each primary in a large cluster setup.
My suggestion is to log the state periodically every few seconds to debug better. #2011
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sarthakaggarwal97 Btw warning would increase the severity of logging. I want to reduce it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack! #2011 makes sense to me!
I don't feel strongly either way, so would appreciate input @enjoy-binbin once he is back from vacation. |
After node failure detection/recovery and gossip by each primary, we log about the failure detection/recovery at NOTICE level which can spam the server and the behavior is quite expensive on ec2 burstable instance types. I would prefer us rolling it back to VERBOSE level.
Change was introduced in #633