Missing filesystem mutexes & mutex debugging #6536
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Background
I noticed a number of filesystem issues which seem to be consistent with data being incorrectly written to disk, potentially due to race conditions and suchlike, and so I added some code to debug the mutexes which are in place to prevent such issues, in case there was an issue.
Thus, this pull request includes several changes to the
server
package, primarily focusing on adding mutex debugging capabilities with a new error type.This debugging information demonstrated some locations where functions which require locks were being called without a lock, thus I have also added some mutex locks/unlocks to address these issues.
Mutex Debugging Enhancements:
hasLock
,preEmptLock
,hasLockthing
, andlogUnLock
functions to check and log the status of mutex locks, assisting in debugging lock-related issues (server/util.go
). Specifically:hasLock
checks if a lock is held on a particular mutex. I have added this to a number of functions inserver/filestore.go
which are labelled as "must hold lock". This causes the function in question to return an error of typeErrNoLockHeld
if the lock is not held. This may not prove to be a long term solution, but it is useful for diagnosing where mutex lock acquisition is missing.preEmptLock
check if a lock is held on a particular mutex, and acquires one if it is not. I have added this to a number of functions inserver/filestore.go
which are labelled as "must hold lock" but do not return an error type, and so cannot work quite as gracefully as usinghasLock
to return an error. This way at least, they will obtain a lock for the duration of their operation. I should stress that this is not intended to be a long term solution, rather a band-aid in order to determine where mutex lock acquisition is missing.hasLockthing
is a function that nothing else really needs to use, but provides the functionality ofhasLock
andpreEmptLock
logUnlock
is a counterpart to these functions and simply unlocks a mutex but includes some logging.The above functions check the environment variable
MUTEX_CHECK_DEBUG_FILE
, if this variable is set, then each timehasLock
,preEmptLock
are called and discover that a mutex lock is not held (i.e. something that was supposed to acquire a lock before calling a function, did not) then it will append a line to the file described in the var. This line is a JSON representation of a stack trace in order to debug which chain of functions ended up calling a function that requires a lock, without a lock. This format it to make quick analysis of the output very easy usingjq
or similar.New Error Types:
ErrNoLockHeld
error to handle cases where an expected lock is not held.Configuration Updates:
Options
struct (server/opts.go
).Signed-off-by: Lee Brotherston [email protected]