You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Due to the way that locking was written into this toy LPM implementation,
the read-only code in lpm_deliver() obtains a *write* lock on the tree FIB.
The effect of this is the unnatural combination of first obtaining an RCU
read lock and then a FIB write lock in lpm_deliver(). This conflicts with
functions like local_newroute(), which first obtain a FIB write lock and
then wait for RCU readers to finish before flushing anchors. The problem
is that deadlock can occur with the following interleaving:
thread1 thread2
======= =======
local_newroute(): get write lock
lpm_deliver(): get RCU read lock
local_newroute(): wait for RCU readers
lpm_deliver(): wait for write lock
Notice that this deadlock cannot occur if lpm_deliver() gets the write
lock first.
To fix this, we can duplicate the FIB entry whose anchor needs to be
flushed, replace the old entry with the duplicate, and then release the
lock. This allows writers of the tree, including the code in lpm_deliver(),
to proceed. We then wait an RCU synchronization to flush the old entry's
anchor, since routing mechanism code (not involved with the LPM tree)
may still be reading that entry. Once all these readers are done, the old
entry can be reclaimed.
This removes the deadlock, but future iterations of the LPM principal
should use RCU instead of rwlocks to avoid this unnatural locking.
0 commit comments