Skip to content

Make task executor lifecycled to platform readiness. #2715

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 3, 2025

Conversation

wizjany
Copy link
Collaborator

@wizjany wizjany commented Feb 17, 2025

Closes #2459.

Note that the platform list (now map) was/is not threadsafe, which in theory might be problematic if platforms throw ready/unready in multiple threads. Ready/unready should be fixed by #2571 if this is an issue.

@wizjany wizjany marked this pull request as ready for review March 2, 2025 17:23
@wizjany wizjany requested a review from a team as a code owner March 2, 2025 17:23
@wizjany
Copy link
Collaborator Author

wizjany commented Mar 2, 2025

This is now tested on Fabric, Neoforge, and Bukkit; client and server.

Notably, the schematic tasks don't actually listen to interrupts, so a long-running schematic load in single player will actually run to completion when closing a world and loading back into it (or another). However, the running schematic task doesn't affect the new world as the actor/session/etc. is no longer in use. It also doesn't prevent server shutdowns as the executor is no longer waiting for task completion.

Looking at the original comments, it seems there was some concern about a timer thread, but it doesn't seem to be causing any issues. I verified that this change does actually fix the issue by commenting out the executor shutdown call and watching the server not shut down.

@wizjany wizjany merged commit f67f649 into version/7.3.x Mar 3, 2025
5 checks passed
@wizjany wizjany deleted the fix/executor-shutdown branch March 3, 2025 14:23
@Fallen-Breath
Copy link

Since #2459 is locked, I'm posting more information on #2459 here

There are 2 bugs in worldedit that made #2459 happen

  1. WorldEdit.getInstance().getExecutorService() does not shutdown on server stop. This might block the server for up to 60s. This is what this PR fixes
  2. FutureProgressListener contains a non-daemon Timer thread. This might block the server forever

All attempts (#2460 / #2570 / #2592 / #2715) in this repository focus on the 1st bug, not the 2nd one

Believe it or not, the 2nd bug still exists in the latest release 7.3.11, and is still reproduce-able with the steps metioned in #2459 (comment). Here's the server log and the jstack output

So, this PR does not close #2459

Please always remember that arrogantly locking a conversation does not truly help solve the problem

@me4502
Copy link
Member

me4502 commented Mar 27, 2025

Since #2459 is locked, I'm posting more information on #2459 here

There are 2 bugs in worldedit that made #2459 happen

  1. WorldEdit.getInstance().getExecutorService() does not shutdown on server stop. This might block the server for up to 60s. This is what this PR fixes
  2. FutureProgressListener contains a non-daemon Timer thread. This might block the server forever

All attempts (#2460 / #2570 / #2592 / #2715) in this repository focus on the 1st bug, not the 2nd one

Believe it or not, the 2nd bug still exists in the latest release 7.3.11, and is still reproduce-able with the steps metioned in #2459 (comment). Here's the server log and the jstack output

So, this PR does not close #2459

Please always remember that arrogantly locking a conversation does not truly help solve the problem

The first merged fix originally did touch the second issue, but it caused so many major issues it was removed before it was even merged. Changing that to a daemon thread is not a safe fix, an alternative that actually works would be needed.

@Fallen-Breath
Copy link

The first merged fix originally did touch the second issue

True. But eventaully all of these PRs got ignored or choose to not focus on the 2nd one. Also it's definitely not meaning that "but it doesn't seem to be causing any issues". Please double check and verify the provided reproducle steps before closing #2459

@me4502
Copy link
Member

me4502 commented Mar 28, 2025

The first merged fix originally did touch the second issue

True. But eventaully all of these PRs got ignored or choose to not focus on the 2nd one. Also it's definitely not meaning that "but it doesn't seem to be causing any issues". Please double check and verify the provided reproducle steps before closing #2459

None of us have ever been able to reproduce the issue laid out in there, it seems to only affect a very small subset of users. We can’t verify that it’s fixed, we can only verify that things aren’t more broken than they were prior. And making that Timer a daemon does break things substantially.

@Fallen-Breath
Copy link

 but it caused so many major issues

Any example issue caused by daemonizing the timer thread? I haven't encountered any issue after making the Timer thread daemon

@Fallen-Breath
Copy link

Fallen-Breath commented Mar 28, 2025

None of us have ever been able to reproduce the issue laid out in there

If you don't know how to reproduce the issue with steps metioned in #2459 (comment) and #2715 (comment), here's a universal reproduce script that works on all x86 linux system with docker installed:

docker run --rm -w /worldedit --name worldedit2459 eclipse-temurin:21.0.6_7-jdk bash -c "$(cat <<EOF

mkdir mods
wget -q https://cdn.modrinth.com/data/1u6JkXh5/versions/bxlboAan/worldedit-mod-7.3.11.jar -O mods/worldedit-mod-7.3.11.jar
wget -q https://maven.fabricmc.net/net/fabricmc/fabric-installer/1.0.1/fabric-installer-1.0.1.jar
wget -q https://piston-data.mojang.com/v1/objects/4707d00eb834b446575d89a61a11b5d548d8c001/server.jar
java -jar fabric-installer-1.0.1.jar server -mcversion 1.21.4

echo "eula=true" > eula.txt
echo -e "searchitem stone\nstop\n" | java -Xmx1G -jar fabric-server-launch.jar

EOF
)"

This script will run MC 1.21.4 with latest fabricloader and WorldEdit 7.3.11 in the temurin 21 docker container. example output

After executing this script, you'll get a server that stucks on server-stop forever. You can inspect the java process stack with:

docker exec worldedit2459 bash -c 'jstack $(pidof java)'

There'll be a Timer-1 thread inside the jstack output. example output

@Fallen-Breath
Copy link

it seems to only affect a very small subset of users

I don't aggree

Any code-path that triggers class load of the FutureProgressListener class will trigger the timer issue (the timer is a static field)

e.g. with method com.sk89q.worldedit.command.util.AsyncCommandBuilder#buildAndExec

FutureProgressListener.addProgressListener(
future,
sender,
delayMessage,
workingMessage
);

Some example in-game way to trigger the timer stucking bug:

  1. Load / Save schematic files
  2. Use /searchitem, listchunks etc. commands

These are all common in-game usages. Or do you insist that loading/saving schematic files is a rare action that is only used by a very small subset of users

@wizjany
Copy link
Collaborator Author

wizjany commented Mar 28, 2025

To the contrary, the fact that saving/loading schematics is a rather common use case but only a few people have ever reported running into this issue is what implies it only affects a small subset of users. The fact that that code has been in use since 2018 (technically since 2015) but this was only reported a year ago also implies that something else (JVM changes? GC specific? server software specific?) might be causing the issue.

I did specifically mention in my comment above that I was aware of the discussion but wasn't able to reproduce the issue. While I'm not sure if daemonizing the timer thread would cause issues or not, it didn't seem necessary both historically and in testing, so I didn't do it.

@Fallen-Breath
Copy link

Okay, I give up on trying to get you to completely fix this issue. I don't have the time or energy to set up a series of environments from 2015 to 2025 and reproduce this issue one by one just to disprove the claim that "if it took this long for an issue to appear, then it must not be a bug in the code"

For anyone who comes across this later and is simply looking for a solution, you can check out this mod. It consists of just two simple mixins and works on all Fabric/Forge/NeoForge platforms since the 1.14 release in 2019

@wizjany
Copy link
Collaborator Author

wizjany commented Mar 28, 2025

You're either not reading what I said or purposefully misrepresenting it. I never said I'm against making the timer thread a daemon (I don't actually know that it causes issues, but the fact that making the timer non-static, like the one in sessionmanager, didn't fully fix the issue does seem strange), or that it's not "a bug in the code", just that it's been like that forever and isn't as easily reproducible as you're making it out to be, and that I didn't make that change here because I couldn't reproduce the issue.

@Fallen-Breath
Copy link

I never said I'm against making the timer thread a daemon

I also never said you have said that

I'm just tired of this. Whatever you guys think, just let it be. Thanks for maintaining WorldEdit for such a long time

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fabric with worldedit hangs when stopping during shutdown
3 participants