fix: WorkflowTaskSets size bloat for large workflows#16075
Conversation
0fa410b to
1bdcf97
Compare
terrytangyuan
left a comment
There was a problem hiding this comment.
Could you add some tests to exercise this change?
Sure, I added a test to verify that compressed workflows now fail explicitly with a clear error instead of silently proceeding with invalid taskset cleanup logic. I also was considering adding a large stress/e2e test for the executor plugin around the 2MB limit, but I’m not sure it’s a good idea. |
d71d2d5 to
0bd9e8c
Compare
|
@terrytangyuan
|
terrytangyuan
left a comment
There was a problem hiding this comment.
@Joibel Can you take another look here?
| } | ||
|
|
||
| func (t *Then) ExpectWorkflowCompressed() *Then { | ||
| ctx := logging.TestContext(t.t.Context()) |
There was a problem hiding this comment.
Follow ExpectWorkflow, ExpectWorkflowName and call expectWorkflow(name, func) instead of hand rolling this.
There was a problem hiding this comment.
Good point. I considered using ExpectWorkflow as well, but it dehydrates the workflow here before the function check, so it doesn't work for this particular check.
| return err | ||
| } | ||
| if last == nil || fi.ModTime() != last.ModTime() || fi.Size() != last.Size() { | ||
| if last == nil || !fi.ModTime().Equal(last.ModTime()) || fi.Size() != last.Size() { |
There was a problem hiding this comment.
Is this relevant? It looks like a good change, but perhaps should be separate - is there something here that tickled this problem?
Having said that, this is better so happy to retain the change.
There was a problem hiding this comment.
This is not directly related to the fix. I noticed it while working on this area and changed it as a small cleanup.
I thought it might not be worth opening a separate PR and running the full e2e pipeline just for this small change. That said, if you prefer to keep the PR strictly focused, I can remove it.
35fd092 to
0f501b7
Compare
0f501b7 to
27b0f6e
Compare
removeCompletedTaskSetStatus was executed after workflow dehydration, causing completed task set nodes to be skipped during cleanup. This change: This change: - moves cleanup before dehydration Signed-off-by: arpechenin <arpechenin@avito.ru>
27b0f6e to
682545a
Compare
Signed-off-by: arpechenin <arpechenin@avito.ru>
682545a to
eea539d
Compare
|
@Joibel could you please take another look? I’ve addressed your review comments. |
Motivation
removeCompletedTaskSetStatus was executed after workflow dehydration, causing completed task set nodes to be skipped during cleanup and leading to WorkflowTaskSets size bloat. This is because getDeleteTaskAndNodePatch relies on in-memory status.nodes, which are cleared during workflow compression.
As a result, large compressed workflows could continuously accumulate completed task set data and eventually fail with errors such as:
This change:
Fixes #TODO
Verification
Verified the changes on:
Confirmed that completed task sets are cleaned up correctly and workflows no longer fail with oversized patch requests.
AI