-
Notifications
You must be signed in to change notification settings - Fork 2.3k
fix(ui): prevent race conditions after move row by only replacing state for latest change #8961
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix(ui): prevent race conditions after move row by only replacing state for latest change #8961
Conversation
5e4ce72
to
015c17f
Compare
96c1675
to
a562670
Compare
As mentioned in the PR description, i've set up minimal reproducible example at https://github.com/dpnolte/payload-blocks-drag-issue/tree/main . |
I'm struggling to replicate your issue on the latest beta. I've followed the steps and the exact configuration and I can't see any data loss when moving items around. I tried variations of this and made sure to return the items to the original position but it all looks fine |
Thanks for looking into it. The video that I recorded was on the latest beta, see also the code here: https://github.com/dpnolte/payload-blocks-drag-issue/blob/main/package.json. The issue becomes more prevalent when using Perhaps it has more to do with quickly dragging the blocks around and the right throttlign. That is, it helps to expand all blocks first, drag the blocks around multiple times (like 20x times) and quickly with throttling to 3g to replicate the issue. This is a bug in the category that most of the time it will work fine. But our users already experienced this a couple of times. And it has then serious impact because it loses their changes with autosave. If you want, I can show it to you in a video call? |
Thanks for the follow up video, managed to reproduce! Will check the PR and see if it's the root cause |
Great! Thanks for looking into it!
…On Wed, 6 Nov 2024 at 21:10, Paul ***@***.***> wrote:
Thanks for the follow up video, managed to reproduce! Will check the PR
and see if it's the root cause
—
Reply to this email directly, view it on GitHub
<#8961 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AANVUDJRD3P5JETC6ZER6SDZ7JZVDAVCNFSM6AAAAABQ5ZH7ZGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINRQGY4DEMJQG4>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
I can confirm the PR works, though I'll put it on ice for a few days until we the upcoming RSC PR Is merged, since it will touch many files including this one, it may also fix the problem in another way. I'll revisit this at that point, thank you for your patience |
All right, I can imagine you want to let the assumingly big PR land first :) Could you please link the PR for RSC into this conversation so that it can be traced? |
This is the PR #8364 , we're expecting to merge it very soon so one of the other reasons we've put some PRs on hold is to avoid merge conflicts |
@paulpopus the PR has ben merged for a while now. But the issue still persist since there respective code has not changed. Could we please get this PR merged in the so that we can get this issue resolved? |
@dpnolte Following the same video I haven't been able to reproduce this on the latest of v3. The data does eventually show as expected, it's just a bit slow in doing that versus what it did before with never showing up |
@paulpopus I've just reproduced it again with payload 3.5.0. Please see the video below. There is no code change so I also don't expected it to fixed? Screen.Recording.2024-12-11.at.21.34.08.mov |
Just also reproduced on payload 3.6.0. The code has not changed. I really hope we can get this merged now please. Screen.Recording.2024-12-11.at.21.52.29.mov |
Thank you for checking again. We're looking into this ASAP |
We're going to need some tests written for this to prevent future regression, @dpnolte would be able to do this? |
I could have a go at it. Hopefully, I'll find the time soon. I think it will be tricky to create the right preconditions to test the race condition properly. Namely, at the edit page, the Edit's view abort controller in the on change handler often prevents the race condition by aborting the previous form state call (see relevant code). It could therefore be better to not test this with a default edit page, but with a custom view that uses the Form component with a custom onChange handler as property. It would circumvent the race condition prevention that the Edit's abort controller provides. Does that make sense? Any recommendations perhaps to setup such a custom view (perhaps there is a similar test already that I can refer to)? Thanks! |
@dpnolte I'm unable to reproduce this no matter how hard I try. I have your repo pulled down and running locally. Still no luck. I've been playing with this for quite some time, following your instructions as exactly as possible. I even bumped down to v3.5.0 to ensure that #9892 wasn't the fix for this. Here's a screen recording from my end: Untitled.mp4Regardless, thank you for your diligence on this issue so far. I really appreciate all the detail you've provided. It's very helpful. Hopefully we can achieve a consistent reproduction here and get this one squared away. |
@leon-marzahn your issue sounds more like #10070, which is that your blocks are stuck in an infinite loading state when adding new rows. This is fixed with #10175. This particular issue, while similar, describes how rearranging blocks can randomly misplace the block state. Can anyone confirm that this issue still exists? I'm so far unable to recreate this, see my comment here. |
Thanks @jacobsfletch for delving into it and a belated merry christmas! I understand this is hard to reproduce. And admittedly, it now way harder to reproduce than the beta version we were on previously. So things are definitely improving 🚀 I've made quite some effort now to showcase the issue. I hope you appreciate it and we can finally land this. First, I would like to point out. That I could still reproduce this issue even with the latest Payload's 3.11.0 version. I could only reproduce it though manually if I drag the blocks when they are expanded: Screen.Recording.2024-12-25.at.10.59.16.movAnother thing what would have helped if every block type was different in the example repo. But even then, the reproducing steps are difficult because the race condition is caused by out of order form state processing that is caused by dragging blocks in the time window that the race condition can occur. This is the time window that the onChange handler is not yet processed and the next Small diagram to illustrate what I'm trying to save above: EDIT: please see comment below for a smaller test case as reproducible that is part of this PR. To be able to reproduce the issue everytime and to showcase the issue, I've created an automated e2e test in my payload fork that recreates the right pre-conditions for this race condition to occur. Please checkout branch This test drags the blocks around randomly and with random time wait time in between. The following video demonstrates the test with first fix applied and then without fix: Screen.Recording.2024-12-28.at.14.34.22.movThe relevant e2e test can be found here: You can run the test yourself by checking out that branch. This branch is rebased on latest of main (at the moment of writing) and includes the fix that is included with this PR. To run the test yourself, please follow the following steps:
I've run this test countless times. In my results, the test fails every time as some of the form fields have missing values. If you run this test in headed mode, be careful not to move the mouse inside browser as that affects the test. To review that the fix solves this issue, uncomment the lines 668-670 in @payloadcms/ui's Form component. All in all, with your improvements of Payload, the probability of this race condition happening is getting lower and lower. But it can still happen and the likelihood goes up if you have a page with many blocks. It also increases if you deploy it on vercel/serverless since the get form state request could experience a cold start that delays the on change rquest handling. And, unfortunately, if it happens, the impact is high because the data can become lost and the wrong form state can be stored with an autosave. This has happened to some of our users :( Please let me know if anything is unclear. As a next step, if you want, I could also add the test to this PR but then simplified so that it takes less time to run. |
a562670
to
069c086
Compare
@jacobsfletch I've tried to find the smallest reproducible test case for you. I've refined the test quite a bit and have managed to get a reproducible test case without 3g network emulation and only 3 drags. I've added it to this PR as an e2e test, called "should be able to drag blocks during get form state without race condition". You can find it in the fixup commit. It is the smallest test case that I could find to reproduce the issue. The key part of the test is that it perform a drag at the moment the form state is requested. To reproduce the issue, please follow the following steps:
Just in case I've also recorded it in a video: Screen.Recording.2024-12-30.at.09.28.35.movNote: I've also added an e2e test case for dragging polymorphic blocks around since I couldn't find such a test yet. And I thought it would be a nice addition? |
@paulpopus @jacobsfletch could you please have a look? With this merged PR, the probability of this happening went up again since it now more often to be set to be changed. |
…te for the latest change
…ing state for the latest change
069c086
to
38e982f
Compare
I've just checked again by running the e2e test without the |
What?
Race conditions occur when dragging blocks around with the result the wrong data is stored in the block.
To replicate this issue:
To illustrate this, please watch the following video:
https://github.com/user-attachments/assets/671b0488-16ef-4ab0-b52c-289d6986efe9
At the start of the video, each block the properly field values stored.. After the dragging with throttling, the top two blocks contain no values anymore. And the bottom block contains incorrect values; these incorrect values are actually originally from a different block.
Code used in video (just an empty Payload project with
Fruits
collection added):https://github.com/dpnolte/payload-blocks-drag-issue
Why?
This can have serious impact by loosing the canges you've made. We've seen this happening a couple of times in our setup. This happens more often if pages contain more blocks since many blocks will be moved in a longer list and when a block contains additional async fetches to get more data. So it would be really nice to have this fixed.
How?
The underlying cause seems to be that the execute on change processes any handlers as promises. And when the on change handlers are processed, the state is updated again with
REPLACE_STATE
to incorporate any form data from the server (e.g., validation). Since the on change handlers are async, other form state change can be processed which updates the form fields in the mean time. It seems that this can happen at least by dispatching multipleMOVE_ROW
in short timespan. The consqeuence is that when changing the order of the blocks withMOVE_ROW
in the race condition time window, the replace state will overwrite the contents of the blocks based on the incorrect positioning of the blocks .To fix this, this PR checks if the form state has not changed after the async change handlers are processed. If the form state is still the same, it will continue as it does now and dispatch a
REPLACE_STATE
. If not, it will early exit and noREPLACE_STATE
will be dispatched to prevent setting a new state that is derived from an outdated source. Since the form fields have been changed, the onchange event will be triggered again so it will dispatchREPLACE_STATE
to merge server form state into it.