Skip to content

Conversation

@tringenbach
Copy link

Problem description

#2542

A task manager running the task TaskManager::createBackgroundTask("HighlightSourceCode", [this](auto &) { m_textHighlighter.highlightSourceCode(); }); runs on one thread while the main thread's render runs on the main thread, and they both call into pattern editor, and both access m_lines concurrently without locking.

Implementation description

I changed it to call getText() from the main thread and then pass that as a parameter to m_textHighlighter. highlightSourceCode(text);.

This is enough to keep it from crashing at startup when compiled with asan, but I don't think it is a complete fix. I figured I should get some feedback on the direction before changing too much.

Additional things

While I have some experience debugging threading issues and heap corruption, I haven't worked on ImHex before, so I'm probably missing a lot.

@paxcut
Copy link
Collaborator

paxcut commented Dec 7, 2025

First let me thank you for looking into this. I am pretty much in charge of the pattern editor developing and have been working on an update to bring code folding to he editor. A lot of the code dealing with m_lines (as it is called now) has changed so maybe you could try the new code on the mac to see if it still has problems there.While developing the pattern editor i faced heap corruption many times and develped ways to deal with it but only on windows mainly and a bit on linux.
I have run the code application verifier in windows and there is no evidence of heap corruption and no crashes that point to the pattern editor as being the cause.
the changes proposed here are small and in principle they shouldn't affect negatively the builds for windows and linux. I am very close to be able to commit the changes . I know ive been saying this but it should not take more than a few days.

@tringenbach
Copy link
Author

I am very close to be able to commit the changes . I know ive been saying this but it should not take more than a few days.

Sure. Just to clarify: you believe you've fixed this already as a part of a bunch bigger change (that adds code folding to the editor), but you haven't pushed up those changes, but when you do you'd like for me to test them?

Yes, I can do that.

I'm not sure I did a great job of explaining my changes. Basically, it looks like m_lines is being accessed from multiple threads concurrently without any kind of locking. I took the "make a copy" approach. (That just seemed like the least risky option. Adding locking can go really wrong if you don't know what you're doing.)

But that does mean more of a race condition. (It's a use-after-free because the vector gets reallocated on another thread, but that's down to both timing and the vector's previous state...) Not sure if application verifier would always catch that or not. I guess we'll see!

@paxcut
Copy link
Collaborator

paxcut commented Dec 7, 2025

I'll talk a bit about how things work now which is mostly how they use to work but with a few differences that could affect this.. One function of the text editor takes copies of the strings (one per line) stored in m_lines and uses them to create one long string to return. Meanwhile the text editor shares time with the console the debugger, the environment variables and many more components on the main thread while doing things like rendering, find replace it own syntax highlighting, and of course mouse keyboard input output. You can still make changes to m_lines while the function is copying the strings because the function only reads from the strings and never writes to them.
the highlighter is set to catch any out of bounds exception and treats them as a sign that it needs to stop what it is doing and return. thats is the error one expects from the main thread writing over values as they are read bu the highlighter and since the thread doesn't write any data to the data received i don't see how that can cause a crash to the main thread.

Even if you make changes to the text and those changes propagate back to the highlighter to the point where the highlighter can instantly take the new data and before the highlighter is finished processing the old data, that causes the thread process to instantly abort abort and terminate. the way the highlighter returns the colors is the inverse. the thread only writes to the data and the main thread only reads from it. This works well as long as the code doesn't have bugs that cause out of bounds exceptions or else it gets caught in a never ending cycle of tries and failures that will only change when the underlying data changes.

When i first wrote it he highlighter code use to crash a lot and it did so for a long time (months). so much so that I moved it back to the main thread cause i couldn't figure out why it kept crashing. Eventually i made my way to the wonderful world of heap corruption caused by concurrent read/writes done from different threads. I then learned about the application verifier which immediately showed the heap corruption and allowed me to fix all the crashes. i also used the address sanitizers on linux to check that the problems were indeed gone and that there was no more concurrent read/writes. Ever since i dont experience random crashes created on vector construction and none are reported as far as i know. The bugs that remained and i have fixed were easy to find but the crashes caused by corruption need special tools to detect them and microsofts tool on windows is a very good tool for that.

While I am open to the possibility that there could be still problems, they would have to be limited to the Mac builds to which I have no access and no practical way to test the builds, but there has been a very intensive effort to avoid exactly what you describe as being the problem and that also needs consideration. Also keep in mind that the change proposed is very simple and should be easy to see what effect if any has on the entire process.instead of spawning the thread inmediately when changes occur in the editor (that's the furthest distance in time that the editor has before it can do any more editing) the task manager would have to wait for the string to be created before spawning. whereas before the task manager would spawn the task which should have plenty of time (plenty for a computer) to read the text now they could be doing it at the same time since it doesnt matter anymore, i'll stop writing so i can go back to working on the code.

@tringenbach
Copy link
Author

@paxcut Thanks for the detailed explaination, I look forward to trying it out!

@paxcut
Copy link
Collaborator

paxcut commented Dec 12, 2025

Unfortunately is taking me longer than a few days to get things done (doesn't help that I keep getting involved in side quests as i call them) so I wanted to touch base about this one.

The problems that i ran into in the past were never related to the text editor sending text to the highlighter thread because thats a very fast operation compared to everything else thats going on. The biggest problem is the token vector which comes from a third thread but there are mechanisms (semaphore like) to ensure they are synchronized.
I also need to understand how the changes are fixing the problem since the text still has to travel from the main thread to the task and the task cannot just pull the text at any moment because it may be in the middle of being updated and that would also cause a concurrent access to the data.

@tringenbach
Copy link
Author

Unfortunately is taking me longer than a few days to get things done (doesn't help that I keep getting involved in side quests as i call them) so I wanted to touch base about this one.

That's fair. I definitely know how that is. This bug fix is itself a side quest for me lol. But contributing to a hex editor is fun, so that's fine.

I also need to understand how the changes are fixing the problem since the text still has to travel from the main thread to the task and the task cannot just pull the text at any moment because it may be in the middle of being updated and that would also cause a concurrent access to the data.

It fixes the problem by making a copy.

                    auto text = m_textEditor.get(provider).getText();
                    TaskManager::createBackgroundTask("HighlightSourceCode", [this, text](auto &) { m_textHighlighter.highlightSourceCode(text); });

Maybe that's a bit subtle, but what I'm doing is calling getText() from the main thread. The [text] captures a copy of the string. That captured copy ends up as a member of the compiler generated class for the closure / lambda, and lives as long as the closure itself does.

(Although I used auto text, so I suppose I could have screwed it up and it could really be a pointer. Hopefully not!)

I do consider this to be a race condition, and race condictions are often Heisenbugs.

And to be fair, I ran with Address Sanitizer (asan), but not with ThreadSanitizer.

@paxcut
Copy link
Collaborator

paxcut commented Dec 13, 2025

(I also have a hexpat script that was reliably crashing it before I rebuilt with asan

Can you possibly share that pattern so that I can investigate if it also crashes on windows as reliably as it did for you?

@tringenbach
Copy link
Author

Can you possibly share that pattern so that I can investigate if it also crashes on windows as reliably as it did for you?

Oh sorry, I actually opened separate issue, and you did already look at it: #2551

@paxcut
Copy link
Collaborator

paxcut commented Dec 13, 2025

I have tried the code you posted and besides some minor issues it doesn't appear to change anything. I was hopping to have some test that showed how the code changes make a difference. The pattern from that thread is crashing for reasons not related to the one described here so it cannot be used to verify that the problem has been solved. Any ideas on how to do that?,

@tringenbach
Copy link
Author

I was hopping to have some test that showed how the code changes make a difference. The pattern from that thread is crashing for reasons not related to the one described here so it cannot be used to verify that the problem has been solved. Any ideas on how to do that?

Do you mean automated test (unit, integration, etc) or just a manual test?

If you mean manually, I could pull down your branch and test it locally.

What happened was I hit the crash described in #2551 and I suspected heap corruption, so I recompiled with -DCMAKE_CXX_FLAGS="-fsanitize=address" (after asking how to do that on Discord).

Once I did that, it crashed before I could even run my pattern. I created #2542 with more details of the crash:

I built with asan, started it, created a new file, and inserted 20 bytes into it.

And then once I thought I had a fix I posted this PR.

If it's not crashing for you or giving you errors under the windows memory debugger you mentioned, then it's probably fixed.

If you want something automated, I think there's ways to run test suites under asan or tsan, or valgrind or helgrind. I haven't looked at ImHex's automated tests yet though, so I can't really comment further.

@paxcut
Copy link
Collaborator

paxcut commented Dec 17, 2025

One question: when you compile using asan you say you start imhex, open a new file and inserted 20 bytes and the crash occurs. Does the pattern editor have any text on it?

@tringenbach
Copy link
Author

One question: when you compile using asan you say you start imhex, open a new file and inserted 20 bytes and the crash occurs. Does the pattern editor have any text on it?

I didn't add anything but ImHex adds some examples in there now, so I believe that was in there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants