-
Notifications
You must be signed in to change notification settings - Fork 827
Log duplicated activity events #6813
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Log duplicated activity events #6813
Conversation
@@ -2253,6 +2261,65 @@ func (e *mutableStateBuilder) logDataInconsistency() { | |||
tag.WorkflowRunID(runID), | |||
) | |||
} | |||
func (e *mutableStateBuilder) logDuplicatedActivityEvents() { | |||
type activityTaskUniqueEventParams struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't compare events directly. Using this struct to dereference attributes that can guarantee the activity uniqueness
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this will work, equality in maps is being compared by pointer, not by value ( I think, let's confirm I'm not lying). Let's start with a unit test to validate this is working as expected. I think this is will probably not catch duplicates in its current state, unless you squash it into something that's more comparable like a string.
Ensuring that log is called is a bit annoying, so changing the check to just return a bool is one easy solution. eg:
isDuplicated := e.checkDuplicatedActivityEvents()
if isDuplicated{
e.logger.Error("Duplicate activity task event found",
tag.WorkflowDomainName(e.GetDomainEntry().GetInfo().Name),
tag.WorkflowID(e.GetExecutionInfo().WorkflowID),
tag.WorkflowRunID(e.GetExecutionInfo().RunID),
tag.WorkflowScheduleID(scheduledEventID),
tag.WorkflowEventType(event.GetEventType().String())
)
}
})
func (e *mutableStateBuilder) checkDuplicatedActivityEvents() bool {
... // check if the fields are duplicated
}
For the duplicated check, my vague memory was that you were looking to check if the scheduledEventID was duplicated? That's just a primitive type, I think that's possible to just do a map check on? I would start there to begin with, though if you want to check other fields, there's a few solutions, but i'm not entirely sure what it means for the attempt to be duplicated, for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually seems I am a bit wrong and clearly don't understand map keys well enough
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, you're correct and I'm wrong (https://go.dev/ref/spec#Comparison_operators), (demonstrated here)
Struct types are comparable if all their field types are comparable. Two struct values are equal if their corresponding non-blank field values are equal. The fields are compared in source order, and comparison stops as soon as two field values differ (or all fields have been compared).
So I take back what I said, but would agree with Tim's point about a unit test, both to check and to also catch NPE problems when dereferencing
@@ -2253,6 +2261,65 @@ func (e *mutableStateBuilder) logDataInconsistency() { | |||
tag.WorkflowRunID(runID), | |||
) | |||
} | |||
func (e *mutableStateBuilder) logDuplicatedActivityEvents() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could we have some unit test on those cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'll add that to show how it works
What changed?
Added logs to capture duplicated activity events
Why?
There were some cases where started and completed activity events were duplicated, causing the workflows to fail. Couldn't reproduce it locally, so adding logs to try to identify it earlier and debug it.
How did you test it?
Tested locally
Potential risks
Release notes
Documentation Changes