Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Log stack traces on data nodes before they are cleared for transport #125732
base: main
Are you sure you want to change the base?
Log stack traces on data nodes before they are cleared for transport #125732
Changes from 3 commits
fdaba56
79a06aa
e38d20c
6cb677f
78d58a4
b34afc1
9f527eb
8eca23f
93eddcc
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hopefully we adjust log level and this is no longer needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the number of shards is already randomized if you call
createIndex
. SeeESIntegTestCase#numberOfShards
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could probably fold this test into the errortrace true test? You could randomly set the flag to true, otherwise not, the result should be the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(done)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not see where we are logging the trace? It seems we are only logging that its actually removed.
I am thinking we should log the exception as a
WARN
if:However, I defer to @javanna here. He has much better context around what we are trying to achieve.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passing
e
to the logger here outputs the stack trace. Maybe my log message isn't clear... think something like "search failed with exception:" would be better?Interesting point on
WARN
- do you think aWARN
log per shard for the same underlying error is acceptable here? Luca and I decided it would be difficult to dedupe these at the moment. It might become easier with batched query execution.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see.
I think that this shouldn't be debug. Maybe debug for all exceptions. But we should log
WARN
for things that are 5xx, indicating the failure.I think adding a "Clearing stack trace before...." is unnecessary. Logging isn't just for our debugging, but also for users.
I am not sure indicating that the trace is being removed is overly useful here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the log message to be more clear for users, and raised the level
WARN
for 5XX. Thanks for the idea there - I think it makes sense for this new message to log at the same level as rest suppressed.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like the best place to add the logging indeed, because we ensure that we do the additional logging exclusively for the cases where we suppress the stack trace, before doing so.
If we log this at debug, we are not going to see it with the default log level, are we? I think we should use warn instead at least?
The error message looks a little misleading also, all we are interested in is the error itself, so I would log the same that we'd get on the coord node, but this time we'd get the stacktrace.
There's a couple more aspects that deserve attention I think:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the log message to be more clear for users and raised the level to
WARN
on the same condition that the rest suppressed logger logs atWARN
.if (header == false) {
.error_trace=false
. Then again rest.suppressed is only one log line. But I imagine removing any of these logs would count as a breaking change (?), as alerts out there (like our own) might rely on them.