-
Notifications
You must be signed in to change notification settings - Fork 1.2k
DocDB: Reduce spammy logs in cdc service, tx manager, and rocksdb operations #29350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for infallible-bardeen-164bc9 ready!Built without sensitive environment variables
To edit notification comments on pull requests, go to your Netlify project configuration. |
f7097f8 to
940de07
Compare
|
|
||
| auto get_stream_metadata = GetStream(stream_id, RefreshStreamMapOption::kIfInitiatedState); | ||
| if (!get_stream_metadata.ok()) { | ||
| LOG(WARNING) << "Read invalid stream id: " << stream_id << " for tablet " << tablet_id << ": " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should NOT be seeing too many of this one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the past hour, we've seen this log 1.7M times. This is a factor of the number of tablets of which we have around 21k currently. Not all of those tablets are being CDC'd but many thousands are.
Do you have recommendations for what we should look into since we are seeing this log as frequently as we do?
| return; | ||
| } | ||
| YB_LOG_EVERY_N_SECS(WARNING, 1) << "No local transaction status tablet found"; | ||
| YB_LOG_EVERY_N_SECS(WARNING, 10) << "No local transaction status tablet found"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting to 10s is fine. But if you see this then it means you dont have enough transaction status tablets and should take action to create more soon .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of our transaction status tables are between 200-450 tablets. What's the scaling factor we should consider for these tablets?
What
Reduces overly spammy logs that are exacerbated in larger clusters. In a cluster with 100s of tservers some of these logs were logged millions of times per hour.