Description
Sensei ran into an issue where ldk-net-tokio (or some other async context) calls into ChannelManager, which handles some message and then goes to update the ChannelMonitor. Because Sensei is using the new KVStorePersister interface to simplify its persistence it ends up in a sync context in an tokio worker thread thread and then has to block on a future to talk to a (possibly remote) database. It then returns control to tokio by blocking on the future, but at the time we're still holding the channel and ChainMonitor locks, causing most other future execution to block.
While tokio should probably be willing to poll the IO driver in the block_on call from Sensei, even if it did it could still decide to run a different future which ends up blocking on the mutex already held and we're back to a deadlock.
Really if you're storing a monitor update via some async future you should use the TemporaryFailure
return to do the monitor update asynchronously instead of blocking on IO with locks held, but we don't have good utilities for doing so today.
We need to (a) create a AsyncKVStorePersister (it has to either be runtime-specific because it needs the ability to spawn background tasks, as an alternative we could make the user provide a spawn
method which we can call), and (b) adapt the background processor to using it for async manager/graph/score persistence.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Activity
TheBlueMatt commentedon May 16, 2022
Oops missed that this is really just a specific instance of #1367, though one that we can fix easily.
tnull commentedon May 17, 2022
I'd be interested in picking this up soon to get some exposure to the Tokio/Persist side of things.
TheBlueMatt commentedon May 17, 2022
I think this is going to snowball into about 3-4 PRs and a large cleanup of the
ChannelMonitorUpdateErr
stuff, so I'd prefer to tackle this one if that's okay. There's a few places whereChannelMonitorUpdateErr::TemporaryFailure
requires a persistence before returning, which won't work with a fully-asyc persister, so that'll need updating first.tnull commentedon May 17, 2022
Alright, go ahead!
G8XSU commentedon Aug 10, 2022
Doubt:
I am assuming async KVStore will return a future.
So basically we move out of persistence blocking but we don't proceed through critical section of any channel state change(open/send/or anything) until future is completed right ?
TheBlueMatt commentedon Aug 10, 2022
Yep, basically. We actually already support this, but its a bit unsafe - monitor update functions can return
Err(ChannelMonitorUpdateRes::TemporaryFailure)
which puts a channel on ice and doesn't transmit channel state update functions until the monitor update completes. The intent here is to use that feature to implement the future, but we need to first make that feature a bit more robust.TheBlueMatt commentedon Aug 20, 2022
Once we do this, we should also update the docs on the updatestatus's in-progress state variant. #1106 (comment)
TheBlueMatt commentedon Aug 22, 2022
This has turned into a very large project :(. #1678 is the first step, but this is basically the 0.1 milestone, at this point, so I'm just gonna tag it as such.
TheBlueMatt commentedon Oct 13, 2024
While we intend to ship fixes for most known async persist issues in 0.1, it doesn't make sense to hold 0.1 for the async persistence keystore API as well. Instead we should target this in the following release.