Skip to content

Credit system BP: fix crashes when unloading then reloading UE levels #1630

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

GhisBntly
Copy link

IsValid was a first attempt at fixing these crashes which were witnessed only in Shipping mode, ie. with no way of exactly knowing what was going on. But IsValid itself was apparently using CesiumCreditSystemBP after its deletion, so I ended up adding the object to the GC's root set.

@CesiumGS CesiumGS deleted a comment from GhisBntly Mar 14, 2025
@j9liu
Copy link
Contributor

j9liu commented Mar 14, 2025

Thanks @GhisBntly ! I wonder if this fixes #1413. We'll take a look when we get the chance.

@azrogers azrogers self-assigned this Mar 31, 2025
@kring kring added this to the May 2025 Release milestone Apr 1, 2025
@azrogers
Copy link
Contributor

azrogers commented Apr 1, 2025

@GhisBntly Apologies for taking so long to look at this. Do you have steps to reproduce the crash this is meant to fix? I've tried reloading the same level over and over again in a shipping build of the Unreal samples project, and I don't experience any issues no matter how many times I reload. Is there a specific dataset this happens with?

@GhisBntly
Copy link
Author

@azrogers , no worries about the delay 👍 I don't think the crash was related to our particular Bentley tilesets, we had the crash whatever the selected one. It is probably more specific to the way we instantiate the tilesets: to simplify/summarize, I mentioned reloading UE levels because I thought it was the root cause of the crash, but since you don't reproduce, I guess it is only indirectly related to reloading a level.
We actually instantiate the tilesets from the C++ code, ie. it is not present in a level that we (re)load. This is done after selecting the model to load from a list (the "Dashboard"), after which I think our main level is reloaded (with only non-tileset related stuff like lighting, UX BPs...) and the app creates the new tileset.
I would like to describe our application's workflow in more detail, in particular the wrap-up process when we go back to the Dashboard to choose a different model, but I will actually have to ask around or dig in the code because I didn't much contribute to this aspect of the app.

On the other hand, looking at AITwinCesiumCreditSystem::GetDefaultCreditSystem, I am wondering what is actually preventing Unreal's GC from collecting the static CesiumCreditSystemBP, since the bpLoader is destroyed as soon as it did its job to load the asset? The CesiumCreditSystemBP is then only used to spawn the pCreditSystem returned.
Well, the object is static, so I guess the assumption is that static UObject's have indefinite lifetime? I am not too sure about that, for example, while working on our plugin, I think I experienced unexpected static variables deletion when stopping the PIE.
If static objects were truly "immortal", there's no way we would have witnessed those crashes (unless we totally misunderstood their origin, or the Shipping optimizations misled us...)

@azrogers
Copy link
Contributor

azrogers commented Apr 2, 2025

@GhisBntly That's a very good question. On a cursory look over the code, I don't think I'm sure either. This PR might give us some more certainty there, but I'd like to get to the bottom of the credit system's behavior to figure out what's actually going on and going wrong here. @kring do you have any insight?

@kring
Copy link
Member

kring commented Apr 4, 2025

Hmm yeah that GetDefaultCreditSystem code is dodgy. It looks like CesiumCreditSystemBP should be eligible for garbage collection. And if it were to get collected, then nothing would set that field to nullptr and so we'd crash the next time we need to create an ACesiumCreditSystem.

I can't get this to happen, though, not even when gc.CollectGarbageEveryFrame 1 is set in the console. My guess is that UClass instances (which is what this is) are not eligible for garbage collection, at least in the Editor.

@GhisBntly perhaps you are unloading and reloading levels in a built game? And creating them at runtime such that GetDefaultCreditSystem needs to actually create a new ACesiumCreditSystem (versus the more common case where it returns one that already exists in the level)? It seems possible that the CesiumCreditSystemBP would get garbage collected in that case, and that would lead to a crash.

If that's what's happening, then calling AddToRoot on it, as you're doing in this PR, makes a lot of sense to me. We don't want to stop ACesiumCreditSystem from being garbage collected, but keeping the UClass for the credit system blueprint around forever seems fine to me.

@GhisBntly
Copy link
Author

Yes Kevin, that's exactly the situation you describe (unloading and reloading levels in a built game ...).
I just can't find reliable information stating what happens to static UObject-derived members of UObject-derived classes :-/
And I'm wary of UE GC policies, it looks like it behaves quite differently depending on the situation (Development vs. Shipping, in-edior vs. PIE vs. in-game ...) so finding things out through debugging is unreliable.

Some say that AddToRoot without a matching RemoveFromRoot can lead to crashes when exiting the app (https://forums.unrealengine.com/t/prevent-garbage-collecting-how-to-removefromroot/373148/6), and recommend passing the World as Outer (I don't think RF_Standalone is needed in our case). For us it would mean the CesiumCreditSystemBP would be disposed of when unloading the level, and the asset reloaded when needed in the next level, which is also acceptable.
I think I'll test that, if it can prevent the tiniest chance of a crash at exit...

@GhisBntly
Copy link
Author

Well, looks like CesiumCreditSystemBP already has an Outer of type Package, and can't really be Rename'd to change its Outer to the World without some cheating (using RF_ArchetypeObject), so probably not such a good idea...
Will do a quick test with UEngineSubsystem for the sake of learning, I mean, this issue probably doesn't deserve hours of our precious time but I'd like to know what to do in the future in a similar situation...

IsValid was a first attempt at fixing these crashes which were witnessed
only in Shipping mode, ie. with no way of exactly knowing what was going
on. But IsValid itself was apparently using CesiumCreditSystemBP after
its deletion, so I ended up adding the object to the GC's root set.
so that we don't have to manage its lifecycle and especially not
CesiumCreditSystemBP's lifecycle either, which was causing issues
when unloading then reloading levels in-game (Shipping).
@GhisBntly GhisBntly force-pushed the fix-CesiumCreditSystemBP-crash-after-unloading-level branch from 046f175 to 966815c Compare April 4, 2025 16:34
@GhisBntly
Copy link
Author

@kring @azrogers If you want to have a look, I pushed a new version with a loader inheriting from UEngineSubsystem, which seems to work from my initial (quick) testing.
I haven't been able yet to reproduce the crash with the current version of our code, to make sure the new fix still fixes it, though :-/ If I can't reproduce, I will have to rebuild the old version on which the crash was initially witnessed...

About the implementation you'll see it is much simpler, at least if I did not misunderstand the APIs:

  • Being managed as an Engine subsystem means there is no need for NewObject nor ConditionalBeginDestroy
  • Instead of storing CesiumCreditSystemBP, calling LoadSynchronous anytime you need the archetype sounds fine to me, since LoadSynchronous only actually loads if needed, so the 2nd and subsequent calls will only return the stored asset, right?
  • I removed the whole loader object that retrieves the blueprint class in its constructor comment because the BP loader was not doing anything in its constructor, so I guess the comment was outdated.

@GhisBntly
Copy link
Author

I have confirmed that the subsystem also fixes the crash. I don't see any regression in using this version instead of the static variable, but I may not have tested all that needs to be, so I'll let you decide how to proceed with this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants